Updated on 2026.04.03
Usage instructions: here
LLM inference
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2026-04-02 | Goose: Anisotropic Speculation Trees for Training-Free Speculative Decoding | Tao Jin et.al. | 2604.02047 | null |
| 2026-04-02 | DWDP: Distributed Weight Data Parallelism for High-Performance LLM Inference on NVL72 | Wanqian Li et.al. | 2604.01621 | null |
| 2026-04-01 | Fast and Accurate Probing of In-Training LLMs’ Downstream Performances | Zhichen Liu et.al. | 2604.01025 | null |
| 2026-04-01 | Learning from Many and Adapting to the Unknown in Open-set Test Streams | Xiao Zhang et.al. | 2604.00533 | null |
| 2026-04-01 | Scheduling LLM Inference with Uncertainty-Aware Output Length Predictions | Haoyu Zheng et.al. | 2604.00499 | null |
| 2026-04-01 | TENT: A Declarative Slice Spraying Engine for Performant and Resilient Data Movement in Disaggregated LLM Serving | Feng Ren et.al. | 2604.00368 | null |
| 2026-03-31 | ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving | Annette Taberner-Miller et.al. | 2604.00136 | null |
| 2026-03-30 | Understand and Accelerate Memory Processing Pipeline for Disaggregated LLM Inference | Zifan He et.al. | 2603.29002 | null |
| 2026-03-24 | StepCache: Step-Level Reuse with Lightweight Verification and Selective Patching for LLM Serving | Azam Nouri et.al. | 2603.28795 | null |
| 2026-03-30 | A Techno-Economic Framework for Cost Modeling and Revenue Opportunities in Open and Programmable AI-RAN | Gabriele Gemmi et.al. | 2603.28680 | null |
| 2026-03-30 | Tiered Super-Moore’s Law: Price Evolution, Production Frontiers, and Market Competition in Large Language Model Inference Services | Mingdeng Du et.al. | 2603.28576 | null |
| 2026-03-31 | A Switch-Centric In-Network Architecture for Accelerating LLM Inference in Shared-Memory Network | Aojie Jiang et.al. | 2603.28239 | null |
| 2026-03-31 | ITQ3_S: High-Fidelity 3-bit LLM Inference via Interleaved Ternary Quantization with Rotation-Domain Smoothing | Edward J. Yoon et.al. | 2603.27914 | null |
| 2026-03-29 | KVSculpt: KV Cache Compression as Distillation | Bo Jiang et.al. | 2603.27819 | null |
| 2026-03-28 | From Inference Routing to Agent Orchestration: Declarative Policy Compilation with Cross-Layer Verification | Huamin Chen et.al. | 2603.27299 | null |
| 2026-03-28 | ScoutAttention: Efficient KV Cache Offloading via Layer-Ahead CPU Pre-computation for LLM Inference | Qiuyang Zhang et.al. | 2603.27138 | null |
| 2026-03-27 | MemBoost: A Memory-Boosted Framework for Cost-Aware LLM Inference | Joris Köster et.al. | 2603.26557 | null |
| 2026-03-27 | Rocks, Pebbles and Sand: Modality-aware Scheduling for Multimodal Large Language Model Inference | Konstantinos Papaioannou et.al. | 2603.26498 | null |
| 2026-03-27 | AgentCollab: A Self-Evaluation-Driven Collaboration Paradigm for Efficient LLM Agents | Wenbo Gao et.al. | 2603.26034 | null |
| 2026-03-26 | Supercharging Federated Intelligence Retrieval | Dimitris Stripelis et.al. | 2603.25374 | null |
| 2026-03-26 | Interpretable Zero-shot Referring Expression Comprehension with Query-driven Scene Graphs | Yike Wu et.al. | 2603.25004 | null |
| 2026-03-25 | LATS: Large Language Model Assisted Teacher-Student Framework for Multi-Agent Reinforcement Learning in Traffic Signal Control | Yifeng Zhang et.al. | 2603.24361 | null |
| 2026-03-25 | Self-Distillation for Multi-Token Prediction | Guoliang Zhao et.al. | 2603.23911 | null |
| 2026-03-24 | The Diminishing Returns of Early-Exit Decoding in Modern LLMs | Rui Wei et.al. | 2603.23701 | null |
| 2026-03-24 | Energy Efficient Software Hardware CoDesign for Machine Learning: From TinyML to Large Language Models | Mohammad Saleh Vahdatpour et.al. | 2603.23668 | null |
| 2026-03-24 | LLM Inference at the Edge: Mobile, NPU, and GPU Performance Efficiency Trade-offs Under Sustained Load | Pranay Tummalapalli et.al. | 2603.23640 | null |
| 2026-03-24 | Sparser, Faster, Lighter Transformer Language Models | Edoardo Cetin et.al. | 2603.23198 | null |
| 2026-03-24 | Characterizing CPU-Induced Slowdowns in Multi-GPU LLM Inference | Euijun Chung et.al. | 2603.22774 | null |
| 2026-03-23 | Chimera: Latency- and Performance-Aware Multi-agent Serving for Heterogeneous LLMs | Kangqi Ni et.al. | 2603.22206 | null |
| 2026-03-23 | GSEM: Graph-based Self-Evolving Memory for Experience Augmented Clinical Reasoning | Xiao Han et.al. | 2603.22096 | null |
| 2026-03-23 | CurvZO: Adaptive Curvature-Guided Sparse Zeroth-Order Optimization for Efficient LLM Fine-Tuning | Shuo Wang et.al. | 2603.21725 | null |
| 2026-03-25 | PRISM: Breaking the O(n) Memory Wall in Long-Context LLM Inference via O(1) Photonic Block Selection | Hyoseok Park et.al. | 2603.21576 | null |
| 2026-03-22 | TIDE: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference | Jaber Jaber et.al. | 2603.21365 | null |
| 2026-03-22 | The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project | Huamin Chen et.al. | 2603.21354 | null |
| 2026-03-22 | Improving Coherence and Persistence in Agentic AI for System Optimization | Pantea Karimi et.al. | 2603.21321 | null |
| 2026-03-22 | CALVO: Improve Serving Efficiency for LLM Inferences with Intense Network Demands | Weiye Wang et.al. | 2603.21257 | null |
| 2026-03-22 | Can LLMs Fool Graph Learning? Exploring Universal Adversarial Attacks on Text-Attributed Graphs | Zihui Chen et.al. | 2603.21155 | null |
| 2026-03-24 | WWW.Serve: Interconnecting Global LLM Services through Decentralization | Huanyu Wang et.al. | 2603.20661 | null |
| 2026-03-20 | KV Cache Optimization Strategies for Scalable and Efficient LLM Inference | Yichun Xu et.al. | 2603.20397 | null |
| 2026-03-20 | Utility-Guided Agent Orchestration for Efficient LLM Tool Use | Boyan Liu et.al. | 2603.19896 | null |
| 2026-03-20 | Stepwise: Neuro-Symbolic Proof Search for Automated Systems Verification | Baoding He et.al. | 2603.19715 | null |
| 2026-03-20 | HyEvo: Self-Evolving Hybrid Agentic Workflows for Efficient Reasoning | Beibei Xu et.al. | 2603.19639 | null |
| 2026-03-19 | A Pipelined Collaborative Speculative Decoding Framework for Efficient Edge-Cloud LLM Inference | Yida Zhang et.al. | 2603.19133 | null |
| 2026-03-19 | BeamAgent: LLM-Aided MIMO Beamforming with Decoupled Intent Parsing and Alternating Optimization for Joint Site Selection and Precoding | Xiucheng Wang et.al. | 2603.18855 | null |
| 2026-03-19 | From Servers to Sites: Compositional Power Trace Generation of LLM Inference for Infrastructure Planning | Grant Wilkins et.al. | 2603.18383 | null |
| 2026-03-18 | Conflict-Free Policy Languages for Probabilistic ML Predicates: A Framework and Case Study with the Semantic Router DSL | Xunzhuo Liu et.al. | 2603.18174 | null |
| 2026-03-18 | Multi-Trait Subspace Steering to Reveal the Dark Side of Human-AI Interaction | Xin Wei Chia et.al. | 2603.18085 | null |
| 2026-03-17 | NANOZK: Layerwise Zero-Knowledge Proofs for Verifiable Large Language Model Inference | Zhaohui Geoffrey Wang et.al. | 2603.18046 | null |
| 2026-03-18 | RAMP: Reinforcement Adaptive Mixed Precision Quantization for Efficient On Device LLM Inference | Arpit Singh Gautam et.al. | 2603.17891 | null |
| 2026-03-18 | Swarm: Co-Activation Aware KVCache Offloading Across Multiple SSDs | Tuowei Wang et.al. | 2603.17803 | null |
| 2026-03-18 | Multi-stage Flow Scheduling for LLM Serving | Yijun Sun et.al. | 2603.17456 | null |
| 2026-03-18 | ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression | Ruibo Fan et.al. | 2603.17435 | null |
| 2026-03-18 | OmniVLN: Omnidirectional 3D Perception and Token-Efficient LLM Reasoning for Visual-Language Navigation across Air and Ground Platforms | Zhongyuang Liu et.al. | 2603.17351 | null |
| 2026-03-18 | IEMAS: An Incentive-Efficiency Routing Framework for Open Agentic Web Ecosystems | Hongze Liu et.al. | 2603.17302 | null |
| 2026-03-18 | The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency | Huamin Chen et.al. | 2603.17280 | null |
| 2026-03-17 | An End-to-End Framework for Functionality-Embedded Provenance Graph Construction and Threat Interpretation | Kushankur Ghosh et.al. | 2603.17100 | null |
| 2026-03-17 | FleetOpt: Analytical Fleet Provisioning for LLM Inference with Compress-and-Route as Implementation Mechanism | Huamin Chen et.al. | 2603.16514 | null |
| 2026-03-17 | Efficient LLM Serving for Agentic Workflows: A Data Systems Perspective | Noppanat Wadlom et.al. | 2603.16104 | null |
| 2026-03-18 | Resource Consumption Threats in Large Language Models | Yuanhe Zhang et.al. | 2603.16068 | null |
| 2026-03-17 | inference-fleet-sim: A Queueing-Theory-Grounded Fleet Capacity Planner for LLM Inference | Huamin Chen et.al. | 2603.16054 | null |
| 2026-03-16 | BANGLASOCIALBENCH: A Benchmark for Evaluating Sociopragmatic and Cultural Alignment of LLMs in Bangladeshi Social Interaction | Tanvir Ahmed Sijan et.al. | 2603.15949 | null |
| 2026-03-16 | SFCoT: Safer Chain-of-Thought via Active Safety Evaluation and Calibration | Yu Pan et.al. | 2603.15397 | null |
| 2026-03-16 | SkipOPU: An FPGA-based Overlay Processor for Large Language Models with Dynamically Allocated Computation | Zicheng He et.al. | 2603.14785 | null |
| 2026-03-16 | AgentTrace: Causal Graph Tracing for Root Cause Analysis in Deployed Multi-Agent Systems | Zhaohui Geoffrey Wang et.al. | 2603.14688 | null |
| 2026-03-15 | Governing Dynamic Capabilities: Cryptographic Binding and Reproducibility Verification for AI Agent Tool Use | Ziling Zhou et.al. | 2603.14332 | null |
| 2026-03-14 | SVD Contextual Sparsity Predictors for Fast LLM Inference | Georgii Serbin et.al. | 2603.14110 | null |
| 2026-03-17 | APEX-Searcher: Augmenting LLMs’ Search Capabilities through Agentic Planning and Execution | Kun Chen et.al. | 2603.13853 | null |
| 2026-03-14 | Retrieval-Feedback-Driven Distillation and Preference Alignment for Efficient LLM-based Query Expansion | Minghan Li et.al. | 2603.13776 | null |
| 2026-03-13 | Orla: A Library for Serving LLM-Based Multi-Agent Systems | Rana Shahout et.al. | 2603.13605 | null |
| 2026-03-13 | Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference | Huamin Chen et.al. | 2603.13426 | null |
| 2026-03-17 | Serving Hybrid LLM Loads with SLO Guarantees Using CPU-GPU Attention Piggybacking | Zizhao Mo et.al. | 2603.12831 | null |
| 2026-03-13 | Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation | Yichen Zhang et.al. | 2603.12793 | null |
| 2026-03-13 | ToolTree: Efficient LLM Agent Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning | Shuo Yang et.al. | 2603.12740 | null |
| 2026-03-13 | Cost-Efficient Multimodal LLM Inference via Cross-Tier GPU Heterogeneity | Donglin Yu et.al. | 2603.12707 | null |
| 2026-03-13 | 98 $\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the vLLM Semantic Router | Xunzhuo Liu et.al. | 2603.12646 | null |
| 2026-03-13 | When Drafts Evolve: Speculative Decoding Meets Online Learning | Yu-Yang Qian et.al. | 2603.12617 | null |
| 2026-03-12 | TaxBreak: Unmasking the Hidden Costs of LLM Inference Through Overhead Decomposition | Prabhu Vellaisamy et.al. | 2603.12465 | null |
| 2026-03-10 | Detecting Miscitation on the Scholarly Web through LLM-Augmented Text-Rich Graph Learning | Huidong Wu et.al. | 2603.12290 | null |
| 2026-03-12 | IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL | Zhoujun Cheng et.al. | 2603.12151 | null |
| 2026-03-12 | Where Matters More Than What: Decoding-aligned KV Cache Compression via Position-aware Pseudo Queries | Zhenxu Tian et.al. | 2603.11564 | null |
| 2026-03-11 | Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI | Yonas Atinafu et.al. | 2603.11340 | null |
| 2026-03-11 | Markovian Generation Chains in Large Language Models | Mingmeng Geng et.al. | 2603.11228 | null |
| 2026-03-11 | Leech Lattice Vector Quantization for Efficient LLM Compression | Tycho F. A. van der Ouderaa et.al. | 2603.11021 | null |
| 2026-03-11 | CacheSolidarity: Preventing Prefix Caching Side Channels in Multi-tenant LLM Serving Systems | Panagiotis Georgios Pennas et.al. | 2603.10726 | null |
| 2026-03-11 | S-HPLB: Efficient LLM Attention Serving via Sparsity-Aware Head Parallelism Load Balance | Di Liu et.al. | 2603.10353 | null |
| 2026-03-11 | MultiwayPAM: Multiway Partitioning Around Medoids for LLM-as-a-Judge Score Analysis | Chihiro Watanabe et.al. | 2603.10287 | null |
| 2026-03-10 | ESAinsTOD: A Unified End-to-End Schema-Aware Instruction-Tuning Framework for Task-Oriented Dialog Modeling | Dechuan Teng et.al. | 2603.09691 | null |
| 2026-03-10 | Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation | Luxi Lin et.al. | 2603.09527 | null |
| 2026-03-10 | PIM-SHERPA: Software Method for On-device LLM Inference by Resolving PIM Memory Attribute and Layout Inconsistencies | Sunjung Lee et.al. | 2603.09216 | null |
| 2026-03-10 | FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation | Yinpeng Wu et.al. | 2603.09046 | null |
| 2026-03-09 | Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning | Juming Xiong et.al. | 2603.08999 | null |
| 2026-03-09 | ConFu: Contemplate the Future for Better Speculative Sampling | Zongyue Qin et.al. | 2603.08899 | null |
| 2026-03-07 | Turn: A Language for Agentic Computation | Muyukani Kizito et.al. | 2603.08755 | null |
| 2026-03-09 | SERQ: Saliency-Aware Low-Rank Error Reconstruction for LLM Quantization | Yeonsik Park et.al. | 2603.08185 | null |
| 2026-03-09 | EAGLE-Pangu: Accelerator-Safe Tree Speculative Decoding on Ascend NPUs | Chang Han et.al. | 2603.08088 | null |
| 2026-03-09 | Deterministic Differentiable Structured Pruning for Large Language Models | Weiyu Huang et.al. | 2603.08065 | null |
| 2026-03-09 | DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention | Younjoo Lee et.al. | 2603.08026 | null |
| 2026-03-09 | Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization | Jingwei Li et.al. | 2603.08022 | null |
| 2026-03-09 | SageSched: Efficient LLM Scheduling Confronting Demand Uncertainty and Hybridity | Zhenghao Gan et.al. | 2603.07917 | null |
| 2026-03-09 | Ares: Adaptive Reasoning Effort Selection for Efficient LLM Agents | Jingbo Yang et.al. | 2603.07915 | null |
| 2026-03-08 | Temperature-Aware Scheduling of LLM Inference in Large-Scale Geo-Distributed Edge Data Centers with Distributed Optimization | Arash Khalatbarisoltani et.al. | 2603.07810 | null |
| 2026-03-08 | ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUs | Yuzhuang Xu et.al. | 2603.07770 | null |
| 2026-03-06 | MoEless: Efficient MoE LLM Serving via Serverless Computing | Hanfei Yu et.al. | 2603.06350 | null |
| 2026-03-06 | LUMINA: LLM-Guided GPU Architecture Exploration via Bottleneck Analysis | Tao Zhang et.al. | 2603.05904 | null |
| 2026-03-06 | Confidence Before Answering: A Paradigm Shift for Efficient LLM Uncertainty Estimation | Changcheng Li et.al. | 2603.05881 | null |
| 2026-03-05 | Parallelization Strategies for Dense LLM Deployment: Navigating Through Application-Specific Tradeoffs and Bottlenecks | Burak Topcu et.al. | 2603.05692 | null |
| 2026-03-05 | POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation | Zeju Qiu et.al. | 2603.05500 | null |
| 2026-03-05 | Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity | Di Zhang et.al. | 2603.05168 | null |
| 2026-03-05 | Beyond the Context Window: A Cost-Performance Analysis of Fact-Based Memory vs. Long-Context LLMs for Persistent Agents | Natchanon Pollertlam et.al. | 2603.04814 | null |
| 2026-03-05 | Hardware-Software Co-design for 3D-DRAM-based LLM Serving Accelerator | Cong Li et.al. | 2603.04797 | null |
| 2026-03-05 | SLO-Aware Compute Resource Allocation for Prefill-Decode Disaggregated LLM Inference | Luchang Li et.al. | 2603.04716 | null |
| 2026-03-04 | Agentics 2.0: Logical Transduction Algebra for Agentic Data Workflows | Alfio Massimiliano Gliozzo et.al. | 2603.04241 | null |
| 2026-03-04 | A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality | Arther Tian et.al. | 2603.04028 | null |
| 2026-03-03 | From Language to Action: Can LLM-Based Agents Be Used for Embodied Robot Cognition? | Shinas Shaji et.al. | 2603.03148 | null |
| 2026-03-03 | SEALing the Gap: A Reference Framework for LLM Inference Carbon Estimation via Multi-Benchmark Driven Embodiment | Priyavanshi Pathania et.al. | 2603.02949 | null |
| 2026-03-03 | Agentic Self-Evolutionary Replanning for Embodied Navigation | Guoliang Li et.al. | 2603.02772 | null |
| 2026-03-03 | Ouroboros: Wafer-Scale SRAM CIM with Token-Grained Pipelining for Large Language Model Inference | Yiqi Liu et.al. | 2603.02737 | null |
| 2026-03-03 | SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving | Sunghyeon Woo et.al. | 2603.02599 | null |
| 2026-03-02 | Beyond Microservices: Testing Web-Scale RCA Methods on GPU-Driven LLM Workloads | Dominik Scheinert et.al. | 2603.02057 | null |
| 2026-03-02 | Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning | Jiebin Zhang et.al. | 2603.01639 | null |
| 2026-03-02 | Graph-Based Self-Healing Tool Routing for Cost-Efficient LLM Agents | Neeraj Bholani et.al. | 2603.01548 | null |
| 2026-03-02 | Towards Privacy-Preserving LLM Inference via Collaborative Obfuscation (Technical Report) | Yu Lin et.al. | 2603.01499 | null |
| 2026-03-02 | Agentic Multi-Source Grounding for Enhanced Query Intent Understanding: A DoorDash Case Study | Emmanuel Aboah Boateng et.al. | 2603.01486 | null |
| 2026-03-02 | SFCo-Nav: Efficient Zero-Shot Visual Language Navigation via Collaboration of Slow LLM and Fast Attributed Graph Alignment | Chaoran Xiong et.al. | 2603.01477 | null |
| 2026-03-02 | Quasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient Verification | Guang Huang et.al. | 2603.01399 | null |
| 2026-02-27 | Data Driven Optimization of GPU efficiency for Distributed LLM Adapter Serving | Ferran Agullo et.al. | 2602.24044 | null |
| 2026-02-27 | LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding | Alexander Samarin et.al. | 2602.23881 | null |
| 2026-02-27 | SLA-Aware Distributed LLM Inference Across Device-RAN-Cloud | Hariz Yet et.al. | 2602.23722 | null |
| 2026-02-26 | Discourse-Aware Dual-Track Streaming Response for Low-Latency Spoken Dialogue Systems | Siyuan Liu et.al. | 2602.23266 | null |
| 2026-02-26 | LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure | Jaehong Cho et.al. | 2602.23036 | null |
| 2026-02-26 | Accelerating Local LLMs on Resource-Constrained Edge Devices via Distributed Prompt Caching | Hiroki Matsutani et.al. | 2602.22812 | null |
| 2026-02-26 | Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement | Shuchen Zhu et.al. | 2602.22681 | null |
| 2026-02-26 | Compress the Easy, Explore the Hard: Difficulty-Aware Entropy Regularization for Efficient LLM Reasoning | Qin-Wen Luo et.al. | 2602.22642 | null |
| 2026-03-02 | FLYING SERVING: On-the-Fly Parallelism Switching for Large Language Model Serving | Shouwei Gao et.al. | 2602.22593 | null |
| 2026-02-25 | AutoQRA: Joint Optimization of Mixed-Precision Quantization and Low-rank Adapters for Efficient LLM Fine-Tuning | Changhai Zhou et.al. | 2602.22268 | null |
| 2026-02-25 | Sustainable LLM Inference using Context-Aware Model Switching | Yuvarani et.al. | 2602.22261 | null |
| 2026-02-25 | Small Wins Big: Comparing Large Language Models and Domain Fine-Tuned Models for Sarcasm Detection in Code-Mixed Hinglish Text | Bitan Majumder et.al. | 2602.21933 | null |
| 2026-02-25 | Multi-Layer Scheduling for MoE-Based LLM Reasoning | Yifan Sun et.al. | 2602.21626 | null |
| 2026-02-26 | DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference | Yongtong Wu et.al. | 2602.21548 | null |
| 2026-02-25 | Pancake: Hierarchical Memory System for Multi-Agent LLM Serving | Zhengding Hu et.al. | 2602.21477 | null |
| 2026-02-24 | SymTorch: A Framework for Symbolic Distillation of Deep Neural Networks | Elizabeth S. Z. Tan et.al. | 2602.21307 | null |
| 2026-02-24 | ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments | Haley Li et.al. | 2602.21140 | null |
| 2026-02-24 | CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference | Chao Fei et.al. | 2602.20732 | null |
| 2026-02-24 | OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services | Longxiang Wang et.al. | 2602.20595 | null |
| 2026-02-24 | FAST-Prefill: FPGA Accelerated Sparse Attention for Long Context LLM Prefill | Rakshith Jayanth et.al. | 2602.20515 | null |
| 2026-02-23 | KnapSpec: Self-Speculative Decoding via Adaptive Layer Selection as a Knapsack Problem | Seongjin Cha et.al. | 2602.20217 | null |
| 2026-02-21 | MoBiQuant: Mixture-of-Bits Quantization for Token-Adaptive Elastic LLMs | Dongwei Wang et.al. | 2602.20191 | null |
| 2026-02-23 | ISO-Bench: Can Coding Agents Optimize Real-World Inference Workloads? | Ayush Nangia et.al. | 2602.19594 | null |
| 2026-02-22 | A Power Market Model with Hypersaclers and Modular Datacenters | Yihsu Chen et.al. | 2602.19310 | null |
| 2026-02-22 | Scaling Inference-Time Computation via Opponent Simulation: Enabling Online Strategic Adaptation in Repeated Negotiation | Xiangyu Liu et.al. | 2602.19309 | null |
| 2026-02-21 | WANSpec: Leveraging Global Compute Capacity for LLM Inference | Noah Martin et.al. | 2602.18931 | null |
| 2026-02-25 | BiScale: Energy-Efficient Disaggregated LLM Serving via Phase-Aware Placement and DVFS | Omar Basit et.al. | 2602.18755 | null |
| 2026-02-21 | HillInfer: Efficient Long-Context LLM Inference on the Edge with Hierarchical KV Eviction using SmartSSD | He Sun et.al. | 2602.18750 | null |
| 2026-02-24 | RPU – A Reasoning Processing Unit | Matthew Adiletta et.al. | 2602.18568 | null |
| 2026-02-20 | Dual-Tree LLM-Enhanced Negative Sampling for Implicit Collaborative Filtering | Jiayi Wu et.al. | 2602.18249 | null |
| 2026-02-24 | MASPO: Unifying Gradient Utilization, Probability Mass, and Signal Reliability for Robust and Sample-Efficient LLM Reasoning | Xiaoliang Fu et.al. | 2602.17550 | null |
| 2026-02-19 | Privacy-Preserving Mechanisms Enable Cheap Verifiable Inference of LLMs | Arka Pal et.al. | 2602.17223 | null |
| 2026-02-18 | Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark | Charalampos Mastrokostas et.al. | 2602.16811 | null |
| 2026-02-18 | Privacy-Aware Split Inference with Speculative Decoding for Large Language Models over Wide-Area Networks | Michael Cunningham et.al. | 2602.16760 | null |
| 2026-02-18 | FlowPrefill: Decoupling Preemption from Prefill Scheduling Granularity to Mitigate Head-of-Line Blocking in LLM Serving | Chia-chi Hsieh et.al. | 2602.16603 | null |
| 2026-02-18 | LLM-Driven Intent-Based Privacy-Aware Orchestration Across the Cloud-Edge Continuum | Zijie Su et.al. | 2602.16100 | null |
| 2026-02-17 | CLAA: Cross-Layer Attention Aggregation for Accelerating LLM Prefill | Bradley McDanel et.al. | 2602.16054 | null |
| 2026-02-17 | MoE-Spec: Expert Budgeting for Efficient Speculative Decoding | Bradley McDanel et.al. | 2602.16052 | null |
| 2026-02-17 | Learning to Retrieve Navigable Candidates for Efficient Vision-and-Language Navigation | Shutian Gu et.al. | 2602.15724 | null |
| 2026-02-17 | LLM-as-Judge on a Budget | Aadirupa Saha et.al. | 2602.15481 | null |
| 2026-02-16 | Text Style Transfer with Parameter-efficient LLM Finetuning and Round-trip Translation | Ruoxi Liu et.al. | 2602.15013 | null |
| 2026-02-16 | Efficient Multi-round LLM Inference over Disaggregated Serving | Wenhao He et.al. | 2602.14516 | null |
| 2026-02-16 | WiSparse: Boosting LLM Inference Efficiency with Weight-Aware Mixed Activation Sparsity | Lei Chen et.al. | 2602.14452 | null |
| 2026-02-15 | HiVid: LLM-Guided Video Saliency For Content-Aware VOD And Live Streaming | Jiahui Chen et.al. | 2602.14214 | null |
| 2026-02-14 | ThunderAgent: A Simple, Fast and Program-Aware Agentic Inference System | Hao Kang et.al. | 2602.13692 | null |
| 2026-02-13 | Characterize LSM-tree Compaction Performance via On-Device LLM Inference | Jiabiao Ding et.al. | 2602.12669 | null |
| 2026-02-13 | Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats | Pengxiang Zhao et.al. | 2602.12635 | null |
| 2026-02-13 | TensorCommitments: A Lightweight Verifiable Inference for Language Models | Oguzhan Baser et.al. | 2602.12630 | null |
| 2026-02-12 | OServe: Accelerating LLM Serving via Spatial-Temporal Workload Orchestration | Youhe Jiang et.al. | 2602.12151 | null |
| 2026-02-12 | PrefillShare: A Shared Prefill Module for KV Reuse in Multi-LLM Disaggregated Serving | Sunghyeon Woo et.al. | 2602.12029 | null |
| 2026-02-12 | Predicting LLM Output Length via Entropy-Guided Representations | Huanyi Xie et.al. | 2602.11812 | null |
| 2026-02-12 | Deep Kernel Fusion for Transformers | Zixi Zhang et.al. | 2602.11808 | null |
| 2026-02-12 | GORGO: Maximizing KV-Cache Reuse While Minimizing Network Latency in Cross-Region LLM Load Balancing | Alessio Ricci Toniolo et.al. | 2602.11688 | null |
| 2026-02-12 | LoRA-based Parameter-Efficient LLMs for Continuous Learning in Edge-based Malware Detection | Christian Rondanini et.al. | 2602.11655 | null |
| 2026-02-12 | PASCAL: A Phase-Aware Scheduling Algorithm for Serving Reasoning-based Large Language Models | Eunyeong Cho et.al. | 2602.11530 | null |
| 2026-02-12 | PAM: Processing Across Memory Hierarchy for Efficient KV-centric LLM Serving System | Lian Liu et.al. | 2602.11521 | null |
| 2026-02-12 | Differentially Private and Communication Efficient Large Language Model Split Inference via Stochastic Quantization and Soft Prompt | Yujie Gu et.al. | 2602.11513 | null |
| 2026-02-12 | Cachemir: Fully Homomorphic Encrypted Inference of Generative Large Language Model with KV Cache | Ye Yu et.al. | 2602.11470 | null |
| 2026-02-12 | FormalJudge: A Neuro-Symbolic Paradigm for Agentic Oversight | Jiayi Zhou et.al. | 2602.11136 | null |
| 2026-02-11 | Vulnerabilities in Partial TEE-Shielded LLM Inference with Precomputed Noise | Abhishek Saini et.al. | 2602.11088 | null |
| 2026-02-11 | BOute: Cost-Efficient LLM Serving with Heterogeneous LLMs and GPUs via Multi-Objective Bayesian Optimization | Youhe Jiang et.al. | 2602.10729 | null |
| 2026-02-12 | S-GRec: Personalized Semantic-Aware Generative Recommendation with Asymmetric Advantage | Jie Jiang et.al. | 2602.10606 | null |
| 2026-02-12 | QTALE: Quantization-Robust Token-Adaptive Layer Execution for LLMs | Kanghyun Noh et.al. | 2602.10431 | null |
| 2026-02-10 | Beyond SMILES: Evaluating Agentic Systems for Drug Discovery | Edward Wijaya et.al. | 2602.10163 | null |
| 2026-02-12 | Internalizing Multi-Agent Reasoning for Accurate and Efficient LLM-based Recommendation | Yang Wu et.al. | 2602.09829 | null |
| 2026-02-12 | Efficient Remote Prefix Fetching with GPU-native Media ASICs | Liang Mi et.al. | 2602.09725 | null |
| 2026-02-10 | MATA: Multi-Agent Framework for Reliable and Flexible Table Question Answering | Sieun Hyeon et.al. | 2602.09642 | null |
| 2026-02-10 | Rollout-Training Co-Design for Efficient LLM-Based Multi-Agent Reinforcement Learning | Zhida Jiang et.al. | 2602.09578 | null |
| 2026-02-10 | LLM-CoOpt: A Co-Design and Optimization Framework for Efficient LLM Inference on Heterogeneous Platforms | Jie Kong et.al. | 2602.09323 | null |
| 2026-02-09 | PABU: Progress-Aware Belief Update for Efficient LLM Agents | Haitao Jiang et.al. | 2602.09138 | null |
| 2026-02-09 | Benchmarking the Energy Savings with Speculative Decoding Strategies | Rohit Dutta et.al. | 2602.09113 | null |
| 2026-02-09 | FlattenGPT: Depth Compression for Transformer with Layer Flattening | Ruihan Xu et.al. | 2602.08858 | null |
| 2026-02-09 | Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems | Lang Feng et.al. | 2602.08847 | null |
| 2026-02-09 | QUOKA: Query-Oriented KV Selection For Efficient LLM Prefill | Dalton Jones et.al. | 2602.08722 | null |
| 2026-02-09 | Near-Oracle KV Selection via Pre-hoc Sparsity for Long-Context Inference | Yifei Gao et.al. | 2602.08329 | null |
| 2026-02-10 | Compiler-Assisted Speculative Sampling for Accelerated LLM Inference on Heterogeneous Edge Devices | Alejandro Ruiz y Mesa et.al. | 2602.08060 | null |
| 2026-02-08 | Accuracy-Delay Trade-Off in LLM Offloading via Token-Level Uncertainty | Yumin Kim et.al. | 2602.07958 | null |
| 2026-02-08 | MedCoG: Maximizing LLM Inference Density in Medical Reasoning via Meta-Cognitive Regulation | Yu Zhao et.al. | 2602.07905 | null |
| 2026-02-08 | Rethinking Latency Denial-of-Service: Attacking the LLM Serving Framework, Not the Model | Tianyi Wang et.al. | 2602.07878 | null |
| 2026-02-10 | ParisKV: Fast and Drift-Robust KV-Cache Retrieval for Long-Context LLMs | Yanlin Qi et.al. | 2602.07721 | null |
| 2026-02-07 | A Two-Layer Framework for Joint Online Configuration Selection and Admission Control | Owen Shen et.al. | 2602.07663 | null |
| 2026-02-07 | Scout Before You Attend: Sketch-and-Walk Sparse Attention for Efficient LLM Inference | Hoang Anh Duy Le et.al. | 2602.07397 | null |
| 2026-02-07 | Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization | Chong Wang et.al. | 2602.07306 | null |
| 2026-02-06 | SpecAttn: Co-Designing Sparse Attention with Self-Speculative Decoding | Yikang Yue et.al. | 2602.07223 | null |
| 2026-02-06 | When RL Meets Adaptive Speculative Training: A Unified Training-Serving System | Junxiong Wang et.al. | 2602.06932 | null |
| 2026-02-06 | DualMap: Enabling Both Cache Affinity and Load Balancing for Distributed LLM Serving | Ying Yuan et.al. | 2602.06502 | null |
| 2026-02-06 | Do LLMs Act Like Rational Agents? Measuring Belief Coherence in Probabilistic Decision Making | Khurram Yamin et.al. | 2602.06286 | null |
| 2026-02-06 | RoPE-LIME: RoPE-Space Locality + Sparse-K Sampling for Efficient LLM Attribution | Isaac Picov et.al. | 2602.06275 | null |
| 2026-02-03 | PackInfer: Compute- and I/O-Efficient Attention for Batched LLM Inference | Rui Ning et.al. | 2602.06072 | null |
| 2026-02-05 | Towards Green AI: Decoding the Energy of LLM Inference in Software Development | Lola Solovyeva et.al. | 2602.05712 | null |
| 2026-02-05 | Determining Energy Efficiency Sweet Spots in Production LLM Inference | Hiari Pizzini Cavagna et.al. | 2602.05695 | null |
| 2026-02-05 | Optimal Bayesian Stopping for Efficient Inference of Consistent LLM Answers | Jingkai Huang et.al. | 2602.05395 | null |
| 2026-02-05 | RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs | Youngcheon You et.al. | 2602.05367 | null |
| 2026-02-05 | TIDE: Temporal Incremental Draft Engine for Self-Improving LLM Inference | Jiyoung Park et.al. | 2602.05145 | null |
| 2026-02-04 | GPU-to-Grid: Voltage Regulation via GPU Utilization Control | Zhirui Liang et.al. | 2602.05116 | null |
| 2026-02-04 | LinGO: A Linguistic Graph Optimization Framework with LLMs for Interpreting Intents of Online Uncivil Discourse | Yuan Zhang et.al. | 2602.04693 | null |
| 2026-02-04 | Harmonia: Algorithm-Hardware Co-Design for Memory- and Compute-Efficient BFP-based LLM Inference | Xinyu Wang et.al. | 2602.04595 | null |
| 2026-02-04 | LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding | Gang Lin et.al. | 2602.04541 | null |
| 2026-02-04 | Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning | Yansong Ning et.al. | 2602.04284 | null |
| 2026-02-04 | BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models | Junyu Chen et.al. | 2602.04163 | null |
| 2026-02-03 | MeKi: Memory-based Expert Knowledge Injection for Efficient LLM Scaling | Ning Ding et.al. | 2602.03359 | null |
| 2026-02-03 | DynSplit-KV: Dynamic Semantic Splitting for KVCache Compression in Efficient Long-Context LLM Inference | Jiancai Ye et.al. | 2602.03184 | null |
| 2026-02-03 | NLI:Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference | Jiangyong Yu et.al. | 2602.02988 | null |
| 2026-02-03 | Large-Scale LLM Inference with Heterogeneous Workloads: Prefill-Decode Contention and Asymptotically Optimal Control | Ruihan Lin et.al. | 2602.02987 | null |
| 2026-02-03 | 3D-Learning: Diffusion-Augmented Distributionally Robust Decision-Focused Learning | Jiaqi Wen et.al. | 2602.02943 | null |
| 2026-02-02 | A Single Revision Step Improves Token-Efficient LLM Reasoning | Yingchuan Zhang et.al. | 2602.02828 | null |
| 2026-02-02 | Trust by Design: Skill Profiles for Transparent, Cost-Aware LLM Routing | Mika Okamoto et.al. | 2602.02386 | null |
| 2026-02-02 | Focus-dLLM: Accelerating Long-Context Diffusion LLM Inference via Confidence-Guided Context Focusing | Lingkun Long et.al. | 2602.02159 | null |
| 2026-02-02 | Omni-Judge: Can Omni-LLMs Serve as Human-Aligned Judges for Text-Conditioned Audio-Video Generation? | Susan Liang et.al. | 2602.01623 | null |
| 2026-02-01 | Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models | Katrina Brown et.al. | 2602.01237 | null |
| 2026-02-01 | Lotus: Efficient LLM Training by Randomized Low-Rank Gradient Projection with Adaptive Subspace Switching | Tianhao Miao et.al. | 2602.01233 | null |
| 2026-02-01 | A State-Transition Framework for Efficient LLM Reasoning | Liang Zhang et.al. | 2602.01198 | null |
| 2026-02-01 | ReLayout: Versatile and Structure-Preserving Design Layout Editing via Relation-Aware Design Reconstruction | Jiawei Lin et.al. | 2602.01046 | null |
| 2026-02-01 | ESSAM: A Novel Competitive Evolution Strategies Approach to Reinforcement Learning for Memory Efficient LLMs Fine-Tuning | Zhishen Sun et.al. | 2602.01003 | null |
| 2026-01-31 | Sparsity-Aware Unlearning for Large Language Models | Yuze Wang et.al. | 2602.00577 | null |
| 2026-01-30 | Fast Forward: Accelerating LLM Prefill with Predictive FFN Sparsity | Aayush Gautam et.al. | 2602.00397 | null |
| 2026-01-30 | Harvest: Opportunistic Peer-to-Peer GPU Caching for LLM Inference | Nikhil Gopal et.al. | 2602.00328 | null |
| 2026-01-30 | EigenAI: Deterministic Inference, Verifiable Results | David Ribeiro Alves et.al. | 2602.00182 | null |
| 2026-01-30 | Safer Policy Compliance with Dynamic Epistemic Fallback | Joseph Marvin Imperial et.al. | 2601.23094 | null |
| 2026-01-30 | InstructDiff: Domain-Adaptive Data Selection via Differential Entropy for Efficient LLM Fine-Tuning | Junyou Su et.al. | 2601.23006 | null |
| 2026-01-30 | Competitive Non-Clairvoyant KV-Cache Scheduling for LLM Inference | Yiding Feng et.al. | 2601.22996 | null |
| 2026-01-30 | Matterhorn: Efficient Analog Sparse Spiking Transformer Architecture with Masked Time-To-First-Spike Encoding | Zhanglu Yan et.al. | 2601.22876 | null |
| 2026-01-30 | OSNIP: Breaking the Privacy-Utility-Efficiency Trilemma in LLM Inference via Obfuscated Semantic Null Space | Zhiyuan Cao et.al. | 2601.22752 | null |
| 2026-01-30 | CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control | Qiaoling Chen et.al. | 2601.22705 | null |
| 2026-01-30 | Small is Beautiful: A Practical and Efficient Log Parsing Framework | Minxing Wang et.al. | 2601.22590 | null |
| 2026-01-30 | SCaLRec: Semantic Calibration for LLM-enabled Cloud-Device Sequential Recommendation | Ruiqi Zheng et.al. | 2601.22543 | null |
| 2026-01-30 | Towards Resiliency in Large Language Model Serving with KevlarFlow | Shangshu Qian et.al. | 2601.22438 | null |
| 2026-01-29 | Understanding Efficiency: Quantization, Batching, and Serving Strategies in LLM Energy Use | Julien Delavande et.al. | 2601.22362 | null |
| 2026-01-29 | Small Talk, Big Impact: The Energy Cost of Thanking AI | Julien Delavande et.al. | 2601.22357 | null |
| 2026-01-29 | Causal Autoregressive Diffusion Language Model | Junhao Ruan et.al. | 2601.22031 | null |
| 2026-01-29 | A Unified XAI-LLM Approach for EndotrachealSuctioning Activity Recognition | Hoang Khang Phan et.al. | 2601.21802 | null |
| 2026-01-29 | EWSJF: An Adaptive Scheduler with Hybrid Partitioning for Mixed-Workload LLM Inference | Bronislav Sidik et.al. | 2601.21758 | null |
| 2026-01-29 | ScaleSim: Serving Large-Scale Multi-Agent Simulation with Invocation Distance-Based Memory Management | Zaifeng Pan et.al. | 2601.21473 | null |
| 2026-01-29 | Theoretically Optimal Attention/FFN Ratios in Disaggregated LLM Serving | Chendong Song et.al. | 2601.21351 | null |
| 2026-01-29 | Adaptive and Robust Cost-Aware Proof of Quality for Decentralized LLM Inference Networks | Arther Tian et.al. | 2601.21189 | null |
| 2026-01-28 | ChunkWise LoRA: Adaptive Sequence Partitioning for Memory-Efficient Low-Rank Adaptation and Accelerated LLM Inference | Ketan Thakkar et.al. | 2601.21109 | null |
| 2026-01-29 | ProfInfer: An eBPF-based Fine-Grained LLM Inference Profiler | Bohua Zou et.al. | 2601.20755 | null |
| 2026-01-29 | DRAINCODE: Stealthy Energy Consumption Attacks on Retrieval-Augmented Code Generation via Context Poisoning | Yanlin Wang et.al. | 2601.20615 | null |
| 2026-01-28 | TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs | Minjae Lee et.al. | 2601.20357 | null |
| 2026-01-28 | Beyond Speedup – Utilizing KV Cache for Sampling and Reasoning | Zeyu Xing et.al. | 2601.20326 | null |
| 2026-01-28 | SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips | Jiahuan Yu et.al. | 2601.20309 | null |
| 2026-01-28 | LogSieve: Task-Aware CI Log Reduction for Sustainable LLM-Based Analysis | Marcus Emmanuel Barnes et.al. | 2601.20148 | null |
| 2026-01-27 | Identifying and Transferring Reasoning-Critical Neurons: Improving LLM Inference Reliability via Activation Steering | Fangan Dong et.al. | 2601.19847 | null |
| 2026-01-27 | Algorithmic Prompt-Augmentation for Efficient LLM-Based Heuristic Design for A* Search | Thomas Bömer et.al. | 2601.19622 | null |
| 2026-01-29 | PROTEUS: SLA-Aware Routing via Lagrangian RL for Multi-LLM Serving Systems | Amit Singh Bhatti et.al. | 2601.19402 | null |
| 2026-01-27 | DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference | Fuliang Liu et.al. | 2601.19278 | null |
| 2026-01-29 | Native LLM and MLLM Inference at Scale on Apple Silicon | Wayner Barrios et.al. | 2601.19139 | null |
| 2026-01-26 | Randomization Boosts KV Caching, Learning Balances Query Load: A Joint Perspective | Fangzhou Wu et.al. | 2601.18999 | null |
| 2026-01-26 | Flatter Tokens are More Valuable for Speculative Draft Model Training | Jiaming Fan et.al. | 2601.18902 | null |
| 2026-01-26 | Scaling up Privacy-Preserving ML: A CKKS Implementation of Llama-2-7B | Jaiyoung Park et.al. | 2601.18511 | null |
| 2026-01-26 | CovertComBench: The First Domain-Specific Testbed for LLMs in Wireless Covert Communication | Zhaozhi Liu et.al. | 2601.18315 | null |
| 2026-01-26 | FABLE: Forest-Based Adaptive Bi-Path LLM-Enhanced Retrieval for Multi-Document Reasoning | Lin Sun et.al. | 2601.18116 | null |
| 2026-01-25 | A Universal Load Balancing Principle and Its Application to Large Language Model Serving | Zixi Chen et.al. | 2601.17855 | null |
| 2026-01-25 | LLM-42: Enabling Determinism in LLM Inference with Verified Speculation | Raja Gond et.al. | 2601.17768 | null |
| 2026-01-25 | Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction | Jang-Hyun Kim et.al. | 2601.17668 | null |
| 2026-01-24 | GreenServ: Energy-Efficient Context-Aware Dynamic Routing for Multi-Model LLM Inference | Thomas Ziller et.al. | 2601.17551 | null |
| 2026-01-24 | Latent-Space Contrastive Reinforcement Learning for Stable and Efficient LLM Reasoning | Lianlei Shan et.al. | 2601.17275 | null |
| 2026-01-22 | FlexLLM: Composable HLS Library for Flexible Hybrid LLM Accelerator Design | Jiahao Zhang et.al. | 2601.15710 | null |
| 2026-01-21 | Securing LLM-as-a-Service for Small Businesses: An Industry Case Study of a Distributed Chatbot Deployment Platform | Jiazhu Xie et.al. | 2601.15528 | null |
| 2026-01-21 | MARS: Unleashing the Power of Speculative Decoding via Margin-Aware Verification | Jingwei Song et.al. | 2601.15498 | null |
| 2026-01-21 | DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs | Mingxuan Song et.al. | 2601.14711 | null |
| 2026-01-21 | QMC: Efficient SLM Edge Inference via Outlier-Aware Quantization and Emergent Memories Co-Design | Nilesh Prasad Pandey et.al. | 2601.14549 | null |
| 2026-01-20 | Confident Rankings with Fewer Items: Adaptive LLM Evaluation with Continuous Scores | Esma Balkır et.al. | 2601.13885 | null |
| 2026-01-20 | ELSA: Efficient LLM-Centric Split Aggregation for Privacy-Aware Hierarchical Federated Learning over Resource-Constrained Edge Networks | Xiaohong Yang et.al. | 2601.13824 | null |
| 2026-01-20 | HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference | Zhiyuan Shi et.al. | 2601.13684 | null |
| 2026-01-20 | PRIMAL: Processing-In-Memory Based Low-Rank Adaptation for LLM Inference Accelerator | Yue Jiet Chong et.al. | 2601.13628 | null |
| 2026-01-19 | Explicit Cognitive Allocation: A Principle for Governed and Auditable Inference in Large Language Models | Héctor Manuel Manzanilla-Granados et.al. | 2601.13443 | null |
| 2026-01-19 | Probe and Skip: Self-Predictive Token Skipping for Efficient Long-Context LLM Inference | Zimeng Wu et.al. | 2601.13155 | null |
| 2026-01-19 | FastAV: Efficient Token Pruning for Audio-Visual Large Language Model Inference | Chaeyoung Jung et.al. | 2601.13143 | null |
| 2026-01-19 | Sutradhara: An Intelligent Orchestrator-Engine Co-design for Tool-based Agentic Inference | Anish Biswas et.al. | 2601.12967 | null |
| 2026-01-19 | From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation | Jiahao Wang et.al. | 2601.12904 | null |
| 2026-01-23 | An Evolutionary Framework for Automatic Optimization Benchmark Generation via Large Language Models | Yuhiro Ono et.al. | 2601.12723 | null |
| 2026-01-18 | Power Aware Dynamic Reallocation For Inference | Yiwei Jiang et.al. | 2601.12241 | null |
| 2026-01-16 | RAPID-Serve: Resource-efficient and Accelerated P/D Intra-GPU Disaggregation | Amna Masood et.al. | 2601.11822 | null |
| 2026-01-16 | PASTA: A Scalable Framework for Multi-Policy AI Compliance Evaluation | Yu Yang et.al. | 2601.11702 | null |
| 2026-01-16 | HALO: Semantic-Aware Distributed LLM Inference in Lossy Edge Network | Peirong Zheng et.al. | 2601.11676 | null |
| 2026-01-15 | WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching | Xiangchen Li et.al. | 2601.11652 | null |
| 2026-01-16 | FORESTLLM: Large Language Models Make Random Forest Great on Few-shot Tabular Learning | Zhihan Yang et.al. | 2601.11311 | null |
| 2026-01-16 | SwiftKV: An Edge-Oriented Attention Algorithm and Multi-Head Accelerator for Fast, Efficient LLM Decoding | Junming Zhang et.al. | 2601.10953 | null |
| 2026-01-15 | Mugi: Value Level Parallelism For Efficient LLMs | Daniel Price et.al. | 2601.10823 | null |
| 2026-01-14 | Private LLM Inference on Consumer Blackwell GPUs: A Practical Guide for Cost-Effective Local Deployment in SMEs | Jonathan Knoop et.al. | 2601.09527 | null |
| 2026-01-19 | RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering | Wencheng Ye et.al. | 2601.09269 | null |
| 2026-01-14 | LatencyPrism: Online Non-intrusive Latency Sculpting for SLO-Guaranteed LLM Inference | Du Yin et.al. | 2601.09258 | null |
| 2026-01-14 | Evaluating local large language models for structured extraction from endometriosis-specific transvaginal ultrasound reports | Haiyi Li et.al. | 2601.09053 | null |
| 2026-01-13 | HIPPO: Accelerating Video Large Language Models Inference via Holistic-aware Parallel Speculative Decoding | Qitan Lv et.al. | 2601.08273 | null |
| 2026-01-13 | Coordinated Cooling and Compute Management for AI Datacenters | Nardos Belay Abera et.al. | 2601.08113 | null |
| 2026-01-13 | Q-realign: Piggybacking Realignment on Quantization for Safe and Efficient LLM Deployment | Qitao Tan et.al. | 2601.08089 | null |
| 2026-01-12 | Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference | Rei Taniguchi et.al. | 2601.07667 | null |
| 2026-01-12 | ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs | Haoqian Meng et.al. | 2601.07475 | null |
| 2026-01-12 | TALON: Confidence-Aware Speculative Decoding with Adaptive Token Trees | Tianyu Liu et.al. | 2601.07353 | null |
| 2026-01-12 | Stochastic CHAOS: Why Deterministic Inference Kills, and Distributional Variability Is the Heartbeat of Artifical Cognition | Tanmay Joshi et.al. | 2601.07239 | null |
| 2026-01-11 | MicLog: Towards Accurate and Efficient LLM-based Log Parsing via Progressive Meta In-Context Learning | Jianbo Yu et.al. | 2601.07005 | null |
| 2026-01-09 | AIConfigurator: Lightning-Fast Configuration Optimization for Multi-Framework LLM Serving | Tianhao Xu et.al. | 2601.06288 | null |
| 2026-01-07 | AutoVulnPHP: LLM-Powered Two-Stage PHP Vulnerability Detection and Automated Localization | Zhiqiang Wang et.al. | 2601.06177 | null |
| 2026-01-08 | Publishing FAIR and Machine-actionable Reviews in Materials Science: The Case for Symbolic Knowledge in Neuro-symbolic Artificial Intelligence | Jennifer D’Souza et.al. | 2601.05051 | null |
| 2026-01-14 | Challenges and Research Directions for Large Language Model Inference Hardware | Xiaoyu Ma et.al. | 2601.05047 | null |
| 2026-01-08 | CuMA: Aligning LLMs with Sparse Cultural Values via Demographic-Aware Mixture of Adapters | Ao Sun et.al. | 2601.04885 | null |
| 2026-01-08 | Revisiting Judge Decoding from First Principles via Training-Free Distributional Divergence | Shengyin Sun et.al. | 2601.04766 | null |
| 2026-01-08 | GPU-Accelerated INT8 Quantization for KV Cache Compression in Large Language Models | Maanas Taneja et.al. | 2601.04719 | null |
| 2026-01-08 | Prior-Informed Zeroth-Order Optimization with Adaptive Direction Alignment for Memory-Efficient LLM Fine-Tuning | Feihu Jin et.al. | 2601.04710 | null |
| 2026-01-07 | XGrammar 2: Dynamic and Efficient Structured Generation Engine for Agentic LLMs | Linzhang Li et.al. | 2601.04426 | null |
| 2026-01-06 | Ratio-Variance Regularized Policy Optimization for Efficient LLM Fine-tuning | Yu Luo et.al. | 2601.03320 | null |
| 2026-01-01 | $α^3$ -Bench: A Unified Benchmark of Safety, Robustness, and Efficiency for LLM-Based UAV Agents over 6G Networks | Mohamed Amine Ferrag et.al. | 2601.03281 | null |
| 2026-01-06 | Joint Encoding of KV-Cache Blocks for Scalable LLM Serving | Joseph Kampeas et.al. | 2601.03067 | null |
| 2026-01-05 | LoRA-Drop: Temporal LoRA Decoding for Efficient LLM Inference | Hossein Rajabzadeh et.al. | 2601.02569 | null |
| 2026-01-04 | Structured Decomposition for LLM Reasoning: Cross-Domain Validation and Semantic Web Integration | Albert Sadowski et.al. | 2601.01609 | null |
| 2026-01-06 | Making MoE-based LLM Inference Resilient with Tarragon | Songyu Zhang et.al. | 2601.01310 | null |
| 2026-01-08 | From Policy to Logic for Efficient and Interpretable Coverage Assessment | Rhitabrat Pokharel et.al. | 2601.01266 | null |
| 2025-12-31 | Universal Conditional Logic: A Formal Language for Prompt Engineering | Anthony Mikinka et.al. | 2601.00880 | null |
| 2026-01-02 | HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts | Zihan Fang et.al. | 2601.00583 | null |
| 2026-01-01 | Revati: Transparent GPU-Free Time-Warp Emulation for LLM Serving | Amey Agrawal et.al. | 2601.00397 | null |
| 2026-01-01 | FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems | Shanli Xing et.al. | 2601.00227 | null |
| 2025-12-31 | FPGA Co-Design for Efficient N:M Sparse and Quantized Model Inference | Fen-Yu Hsieh et.al. | 2512.24713 | null |
| 2026-01-04 | Hardware Acceleration for Neural Networks: A Comprehensive Survey | Bin Xu et.al. | 2512.23914 | null |
| 2025-12-29 | Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding | Yue Guan et.al. | 2512.23858 | null |
| 2025-12-25 | Break Out the Silverware – Semantic Understanding of Stored Household Items | Michaela Levi-Richter et.al. | 2512.23739 | null |
| 2025-12-28 | Viability and Performance of a Private LLM Server for SMBs: A Benchmark Analysis of Qwen3-30B on Consumer-Grade Hardware | Alex Khalil et.al. | 2512.23029 | null |
| 2025-12-28 | Argus: Token Aware Distributed LLM Inference Optimization | Panlong Wu et.al. | 2512.22925 | null |
| 2025-12-27 | Modality Inflation: Energy Characterization and Optimization Opportunities for MLLM Inference | Mona Moghadampanah et.al. | 2512.22695 | null |
| 2025-12-27 | Nightjar: Dynamic Adaptive Speculative Decoding for Large Language Models Serving | Rui Li et.al. | 2512.22420 | null |
| 2025-12-22 | Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs | Xinhao Cheng et.al. | 2512.22219 | null |
| 2025-12-20 | MatKV: Trading Compute for Flash Storage in LLM Inference | Kun-Woo Shin et.al. | 2512.22195 | null |
| 2025-12-26 | Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling | Hannah Atmer et.al. | 2512.22066 | null |
| 2025-12-26 | Optimizing Resource Allocation for Geographically-Distributed Inference by Large Language Models | Tingyang Sun et.al. | 2512.21884 | null |
| 2025-12-26 | LIME:Accelerating Collaborative Lossless LLM Inference on Memory-Constrained Edge Devices | Mingyu Sun et.al. | 2512.21835 | null |
| 2025-12-25 | nncase: An End-to-End Compiler for Efficient LLM Deployment on Heterogeneous Storage Architectures | Hui Guo et.al. | 2512.21571 | null |
| 2025-12-25 | Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model | Yanhao Li et.al. | 2512.21540 | null |
| 2025-12-23 | KnowVal: A Knowledge-Augmented and Value-Guided Autonomous Driving System | Zhongyu Xia et.al. | 2512.20299 | null |
| 2025-12-23 | Predictive-LoRA: A Proactive and Fragmentation-Aware Serverless Inference System for LLMs | Yinan Ni et.al. | 2512.20210 | null |
| 2025-12-23 | Concept Generalization in Humans and Large Language Models: Insights from the Number Game | Arghavan Bazigaran et.al. | 2512.20162 | null |
| 2025-12-22 | Demystifying LLM-as-a-Judge: Analytically Tractable Model for Inference-Time Scaling | Indranil Halder et.al. | 2512.19905 | null |
| 2025-12-22 | L4: Low-Latency and Load-Balanced LLM Serving via Length-Aware Scheduling | Yitao Yuan et.al. | 2512.19179 | null |
| 2025-12-22 | FASTRIC: Prompt Specification Language for Verifiable LLM Interactions | Wen-Long Jin et.al. | 2512.18940 | null |
| 2025-12-20 | LLM-based Few-Shot Early Rumor Detection with Imitation Agent | Fengzhu Zeng et.al. | 2512.18352 | null |
| 2025-12-20 | TraCT: Disaggregated LLM Serving with CXL Shared Memory KV Cache at Rack-Scale | Dongha Yoon et.al. | 2512.18194 | null |
| 2025-12-20 | Making Strong Error-Correcting Codes Work Effectively for HBM in AI Inference | Rui Xie et.al. | 2512.18152 | null |
| 2025-12-19 | Specification and Detection of LLM Code Smells | Brahim Mahmoudi et.al. | 2512.18020 | null |
| 2025-12-19 | CodeGEMM: A Codebook-Centric Approach to Efficient GEMM in Quantized LLMs | Gunho Park et.al. | 2512.17970 | null |
| 2025-12-19 | Enabling Disaggregated Multi-Stage MLLM Inference via GPU-Internal Scheduling and Resource Sharing | Lingxiao Zhao et.al. | 2512.17574 | null |
| 2025-12-22 | Learning What to Write: Write-Gated KV for Efficient Long-Context Inference | Yen-Chieh Huang et.al. | 2512.17452 | null |
| 2025-12-18 | Taming the Memory Footprint Crisis: System Design for Production Diffusion LLM Serving | Jiakun Fan et.al. | 2512.17077 | null |
| 2025-12-18 | MEPIC: Memory Efficient Position Independent Caching for LLM Serving | Qian Wang et.al. | 2512.16822 | null |
| 2025-12-18 | Kascade: A Practical Sparse Attention Method for Long-Context LLM Inference | Dhruv Deshmukh et.al. | 2512.16391 | null |
| 2025-12-18 | Design and Evaluation of Cost-Aware PoQ for Decentralized LLM Inference | Arther Tian et.al. | 2512.16317 | null |
| 2025-12-18 | Fast Collaborative Inference via Distributed Speculative Decoding | Ce Zheng et.al. | 2512.16273 | null |
| 2025-12-18 | Staggered Batch Scheduling: Co-optimizing Time-to-First-Token and Throughput for High-Efficiency LLM Inference | Jian Tian et.al. | 2512.16134 | null |
| 2025-12-18 | WeMusic-Agent: Efficient Conversational Music Recommendation via Knowledge Internalization and Agentic Boundary Learning | Wendong Bi et.al. | 2512.16108 | null |
| 2025-12-19 | LLM4Perf: Large Language Models Are Effective Samplers for Multi-Objective Performance Modeling | Xin Wang et.al. | 2512.16070 | null |
| 2025-12-18 | MultiPath Transfer Engine: Breaking GPU and Host-Memory Bandwidth Bottlenecks in LLM Services | Lingfeng Tang et.al. | 2512.16056 | null |
| 2025-12-16 | EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving | Shaoting Feng et.al. | 2512.14946 | null |
| 2025-12-16 | Adaptive Cache Pollution Control for Large Language Model Inference Workloads Using Temporal CNN-Based Prediction and Priority-Aware Replacement | Songze Liu et.al. | 2512.14151 | null |
| 2025-12-16 | RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees | Junjie Ma et.al. | 2512.14069 | null |
| 2025-12-16 | MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning | Haoyu Fu et.al. | 2512.13636 | null |
| 2025-12-15 | PROSERVE: Unified Multi-Priority Request Scheduling for LLM Serving | Weizhe Huang et.al. | 2512.12928 | null |
| 2025-12-14 | Counting Clues: A Lightweight Probabilistic Baseline Can Match an LLM | Furong Jia et.al. | 2512.12868 | null |
| 2025-12-14 | Fault-Tolerant Sandboxing for AI Coding Agents: A Transactional Approach to Safe Autonomous Execution | Boyang Yan et.al. | 2512.12806 | null |
| 2025-12-14 | Fine-Grained Energy Prediction For Parallellized LLM Inference With PIE-P | Anurag Dutt et.al. | 2512.12801 | null |
| 2025-12-19 | V-Rex: Real-Time Streaming Video LLM Acceleration via Dynamic KV Cache Retrieval | Donghyuk Kim et.al. | 2512.12284 | null |
| 2025-12-13 | WATOS: Efficient LLM Training Strategies and Architecture Co-exploration for Wafer-scale Chip | Huizheng Wang et.al. | 2512.12279 | null |
| 2025-12-12 | Learning to Extract Context for Context-Aware LLM Inference | Minseon Kim et.al. | 2512.11986 | null |
| 2025-12-11 | CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving | Dong Liu et.al. | 2512.11920 | null |
| 2025-12-12 | PD-Swap: Prefill-Decode Logic Swapping for End-to-End LLM Inference on Edge FPGAs via Dynamic Partial Reconfiguration | Yifan Zhang et.al. | 2512.11550 | null |
| 2025-12-12 | xGR: Efficient Generative Recommendation Serving at Scale | Qingxiao Sun et.al. | 2512.11529 | null |
| 2025-12-12 | AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference | Kuan-Wei Lu et.al. | 2512.11280 | null |
| 2025-12-12 | Adaptive Soft Rolling KV Freeze with Entropy-Guided Recovery: Sublinear Memory Growth for Efficient LLM Inference | Adilet Metinov et.al. | 2512.11221 | null |
| 2025-12-11 | ESS: An Offload-Centric Latent-Cache Management Architecture for DeepSeek-V3.2-Exp | Xinhang Chen et.al. | 2512.10576 | null |
| 2025-12-11 | LLM-Auction: Generative Auction towards LLM-Native Advertising | Chujie Zhao et.al. | 2512.10551 | null |
| 2025-12-12 | BAMBO: Construct Ability and Efficiency LLM Pareto Set via Bayesian Adaptive Multi-objective Block-wise Optimization | Kesheng Chen et.al. | 2512.09972 | null |
| 2025-12-10 | GoodSpeed: Optimizing Fair Goodput with Adaptive Speculative Decoding in Distributed Edge Inference | Phuong Tran et.al. | 2512.09963 | null |
| 2025-12-07 | ELANA: A Simple Energy and Latency Analyzer for LLMs | Hung-Yueh Chiang et.al. | 2512.09946 | null |
| 2025-12-11 | Exqutor: Extended Query Optimizer for Vector-augmented Analytical Queries | Hyunjoon Kim et.al. | 2512.09695 | null |
| 2025-12-10 | WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving | Chiheng Lou et.al. | 2512.09472 | null |
| 2025-12-10 | ODMA: On-Demand Memory Allocation Framework for LLM Serving on LPDDR-Class Accelerators | Guoqiang Zou et.al. | 2512.09427 | null |
| 2025-12-10 | RACAM: Enhancing DRAM with Reuse-Aware Computation and Automated Mapping for ML Inference | Siyuan Ma et.al. | 2512.09304 | null |
| 2025-12-09 | LaMoSys3.5D: Enabling 3.5D-IC-Based Large Language Model Inference Serving Systems via Hardware/Software Co-Design | Qipan Wang et.al. | 2512.08731 | null |
| 2025-12-09 | Magneton: Optimizing Energy Efficiency of ML Systems via Differential Energy Debugging | Yi Pan et.al. | 2512.08365 | null |
| 2025-12-08 | LUNE: Efficient LLM Unlearning via LoRA Fine-Tuning with Negative Examples | Yezi Liu et.al. | 2512.07375 | null |
| 2025-12-08 | Recover-to-Forget: Gradient Reconstruction from LoRA for Efficient LLM Unlearning | Yezi Liu et.al. | 2512.07374 | null |
| 2025-12-08 | NeSTR: A Neuro-Symbolic Abductive Framework for Temporal Reasoning in Large Language Models | Feng Liang et.al. | 2512.07218 | null |
| 2025-12-08 | FOAM: Blocked State Folding for Memory-Efficient LLM Training | Ziqing Wen et.al. | 2512.07112 | null |
| 2025-12-08 | Leveraging KV Similarity for Online Structured Pruning in LLMs | Jungmin Lee et.al. | 2512.07090 | null |
| 2025-12-11 | LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding | Yu Yu et.al. | 2512.06982 | null |
| 2025-12-07 | PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance | Jifar Wakuma Ayana et.al. | 2512.06747 | null |
| 2025-12-07 | KV-CAR: KV Cache Compression using Autoencoders and KV Reuse in Large Language Models | Sourjya Roy et.al. | 2512.06727 | null |
| 2025-12-06 | Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices | Xiangyu Li et.al. | 2512.06443 | null |
| 2025-12-05 | Compass: Mapping Space Exploration for Multi-Chiplet Accelerators Targeting LLM Inference Serving Workloads | Boyu Li et.al. | 2512.06093 | null |
| 2025-12-05 | MaxShapley: Towards Incentive-compatible Generative Search with Fair Context Attribution | Sara Patel et.al. | 2512.05958 | null |
| 2025-12-05 | KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity | Damien Lesens et.al. | 2512.05916 | null |
| 2025-12-05 | From Text to Returns: Using Large Language Models for Mutual Fund Portfolio Optimization and Risk-Adjusted Allocation | Abrar Hossain Mufakir Qamar Ansari Haziq Jeelani Monia Digra Fayeq Jeelani Syed et.al. | 2512.05907 | null |
| 2025-12-05 | Optimizing Medical Question-Answering Systems: A Comparative Study of Fine-Tuned and Zero-Shot Large Language Models with RAG Framework | Tasnimul Hassan et.al. | 2512.05863 | null |
| 2025-12-05 | Capturing Classic Authorial Style in Long-Form Story Generation with GRPO Fine-Tuning | Jinlong Liu et.al. | 2512.05747 | null |
| 2025-12-05 | Distilling Expert Surgical Knowledge: How to train local surgical VLMs for anatomy explanation in Complete Mesocolic Excision | Lennart Maack et.al. | 2512.05740 | null |
| 2025-12-05 | Efficient Text Classification with Conformal In-Context Learning | Ippokratis Pantelidis et.al. | 2512.05732 | null |
| 2025-12-05 | LA-RL: Language Action-guided Reinforcement Learning with Safety Guarantees for Autonomous Highway Driving | Yiming Shu et.al. | 2512.05686 | null |
| 2025-12-05 | A Greek Government Decisions Dataset for Public-Sector Analysis and Insight | Giorgos Antoniou et.al. | 2512.05647 | null |
| 2025-12-05 | ProPhy: Progressive Physical Alignment for Dynamic World Simulation | Zijun Wang et.al. | 2512.05564 | null |
| 2025-12-05 | Conscious Gaze: Adaptive Attention Mechanisms for Hallucination Mitigation in Vision-Language Models | Weijue Bu et.al. | 2512.05546 | null |
| 2025-12-05 | RoBoN: Routed Online Best-of-n for Test-Time Scaling with Multiple LLMs | Jonathan Geuter et.al. | 2512.05542 | null |
| 2025-12-05 | Automated Identification of Incidentalomas Requiring Follow-Up: A Multi-Anatomy Evaluation of LLM-Based and Supervised Approaches | Namu Park et.al. | 2512.05537 | null |
| 2025-12-05 | Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement | Nils Strassenburg et.al. | 2512.05525 | null |
| 2025-12-05 | Know-Show: Benchmarking Video-Language Models on Spatio-Temporal Grounded Reasoning | Chinthani Sugandhika et.al. | 2512.05513 | null |
| 2025-12-05 | A Hybrid Approach for EMF Code Generation:Code Templates Meet Large Language Models | Xiao He et.al. | 2512.05498 | null |
| 2025-12-05 | Knowing Your Uncertainty – On the application of LLM in social sciences | Bolun Zhang et.al. | 2512.05461 | null |
| 2025-12-05 | BEAVER: An Efficient Deterministic LLM Verifier | Tarun Suresh et.al. | 2512.05439 | null |
| 2025-12-05 | A Systematic Framework for Enterprise Knowledge Retrieval: Leveraging LLM-Generated Metadata to Enhance RAG Systems | Pranav Pushkar Mishra et.al. | 2512.05411 | null |
| 2025-12-05 | SQ-format: A Unified Sparse-Quantized Hardware-friendly Data Format for LLMs | Ruixuan Huang et.al. | 2512.05409 | null |
| 2025-12-04 | Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning | Purbesh Mitra et.al. | 2512.05105 | null |
| 2025-12-04 | David vs. Goliath: Can Small Models Win Big with Agentic AI in Hardware Design? | Shashwat Shankar et.al. | 2512.05073 | null |
| 2025-12-04 | Arbitrage: Efficient Reasoning via Advantage-Aware Speculation | Monishwaran Maheswaran et.al. | 2512.05033 | null |
| 2025-12-04 | SEAL: Self-Evolving Agentic Learning for Conversational Question Answering over Knowledge Graphs | Hao Wang et.al. | 2512.04868 | null |
| 2025-12-04 | Model-Based and Sample-Efficient AI-Assisted Math Discovery in Sphere Packing | Rasul Tutunov et.al. | 2512.04829 | null |
| 2025-12-04 | MemLoRA: Distilling Expert Adapters for On-Device Memory Systems | Massimo Bini et.al. | 2512.04763 | null |
| 2025-12-04 | EtCon: Edit-then-Consolidate for Reliable Knowledge Editing | Ruilin Li et.al. | 2512.04753 | null |
| 2025-12-04 | RLHFSpec: Breaking the Efficiency Bottleneck in RLHF Training via Adaptive Drafting | Siqi Wang et.al. | 2512.04752 | null |
| 2025-12-04 | Model Whisper: Steering Vectors Unlock Large Language Models’ Potential in Test-time | Xinyue Kang et.al. | 2512.04748 | null |
| 2025-12-04 | SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs | Wenhua Cheng et.al. | 2512.04746 | null |
| 2025-12-04 | OsmT: Bridging OpenStreetMap Queries and Natural Language with Open-source Tag-aware Language Models | Zhuoyue Wan et.al. | 2512.04738 | null |
| 2025-12-04 | Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild | Yigui Feng et.al. | 2512.04728 | null |
| 2025-12-04 | TRINITY: An Evolved LLM Coordinator | Jinglue Xu et.al. | 2512.04695 | null |
| 2025-12-04 | Towards Ethical Multi-Agent Systems of Large Language Models: A Mechanistic Interpretability Perspective | Jae Hee Lee et.al. | 2512.04691 | null |
| 2025-12-04 | PBFuzz: Agentic Directed Fuzzing for PoV Generation | Haochen Zeng et.al. | 2512.04611 | null |
| 2025-12-04 | Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space | Joey Hong et.al. | 2512.04601 | null |
| 2025-12-04 | A Light-Weight Large Language Model File Format for Highly-Secure Model Distribution | Huifeng Zhu et.al. | 2512.04580 | null |
| 2025-12-04 | On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference | Yue Yu et.al. | 2512.04558 | null |
| 2025-12-04 | AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees | Yangning Li et.al. | 2512.04550 | null |
| 2025-12-04 | EvoEdit: Lifelong Free-Text Knowledge Editing through Latent Perturbation Augmentation and Knowledge-driven Parameter Fusion | Pengfei Cao et.al. | 2512.04545 | null |
| 2025-12-04 | LLM-SrcLog: Towards Proactive and Unified Log Template Extraction via Large Language Models | Jiaqi Sun et.al. | 2512.04474 | null |
| 2025-12-03 | Large Language Models for Limited Noisy Data: A Gravitational Wave Identification Study | Yixuan Li et.al. | 2512.04031 | null |
| 2025-12-03 | AugServe: Adaptive Request Scheduling for Augmented Large Language Model Inference Serving | Ying Wang et.al. | 2512.04013 | null |
| 2025-12-03 | Training-Free Policy Violation Detection via Activation-Space Whitening in LLMs | Oren Rachmil et.al. | 2512.03994 | null |
| 2025-12-03 | Sponsored Questions and How to Auction Them | Kshipra Bhawalkar et.al. | 2512.03975 | null |
| 2025-12-03 | OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference | Liujianfu Wang et.al. | 2512.03927 | null |
| 2025-12-03 | UniMo: Unifying 2D Video and 3D Human Motion with an Autoregressive Framework | Youxin Pang et.al. | 2512.03918 | null |
| 2025-12-03 | Reconstructing KV Caches with Cross-layer Fusion For Enhanced Transformers | Hongzhan Lin et.al. | 2512.03870 | null |
| 2025-12-03 | Training and Evaluation of Guideline-Based Medical Reasoning in LLMs | Michael Staniek et.al. | 2512.03838 | null |
| 2025-12-03 | Log Probability Tracking of LLM APIs | Timothée Chauvin et.al. | 2512.03816 | null |
| 2025-12-03 | Enhancing Instruction-Following Capabilities in Seq2Seq Models: DoLA Adaptations for T5 | Huey Sun et.al. | 2512.03803 | null |
| 2025-12-03 | RoCo: Role-Based LLMs Collaboration for Automatic Heuristic Design | Jiawei Xu et.al. | 2512.03762 | null |
| 2025-12-03 | AR-Med: Automated Relevance Enhancement in Medical Search via LLM-Driven Information Augmentation | Chuyue Wang et.al. | 2512.03737 | null |
| 2025-12-03 | Tutorial on Large Language Model-Enhanced Reinforcement Learning for Wireless Networks | Lingyi Cai et.al. | 2512.03722 | null |
| 2025-12-03 | Knowing oneself with and through AI: From self-tracking to chatbots | Lucy Osler et.al. | 2512.03682 | null |
| 2025-12-03 | ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers | Feice Huang et.al. | 2512.03673 | null |
| 2025-12-03 | Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning | Ge-Peng Ji et.al. | 2512.03667 | null |
| 2025-12-03 | FFTrainer: Fast Failover in Large-Language Model Training with Almost-Free State Management | Bohan Zhao et.al. | 2512.03644 | null |
| 2025-12-03 | KVNAND: Efficient On-Device Large Language Model Inference Using DRAM-Free In-Flash Computing | Lishuo Deng et.al. | 2512.03608 | null |
| 2025-12-03 | EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths | Zhening Li et.al. | 2512.03571 | null |
| 2025-12-03 | State Space Models for Bioacoustics: A comparative Evaluation with Transformers | Chengyu Tang et.al. | 2512.03563 | null |
| 2025-12-03 | TokenScale: Timely and Accurate Autoscaling for Disaggregated LLM Serving with Token Velocity | Ruiqi Lai et.al. | 2512.03416 | null |
| 2025-12-03 | Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs | Ngoc Bui et.al. | 2512.03324 | null |
| 2025-12-02 | LORE: A Large Generative Model for Search Relevance | Chenji Lu et.al. | 2512.03025 | null |
| 2025-12-02 | TokenPowerBench: Benchmarking the Power Consumption of LLM Inference | Chenxu Niu et.al. | 2512.03024 | null |
| 2025-12-02 | Distribution-Calibrated Inference time compute for Thinking LLM-as-a-Judge | Hamid Dadkhahi et.al. | 2512.03019 | null |
| 2025-12-02 | From Moderation to Mediation: Can LLMs Serve as Mediators in Online Flame Wars? | Dawei Li et.al. | 2512.03005 | null |
| 2025-12-02 | FAIRY2I: Universal Extremely-Low Bit QAT framework via Widely-Linear Representation and Phase-Aware Quantization | Feiyu Wang et.al. | 2512.02901 | null |
| 2025-12-02 | MindGPT-4ov: An Enhanced MLLM via a Multi-Stage Post-Training Paradigm | Wei Chen et.al. | 2512.02895 | null |
| 2025-12-02 | OptPO: Optimal Rollout Allocation for Test-time Policy Optimization | Youkang Wang et.al. | 2512.02882 | null |
| 2025-12-02 | Network Self-Configuration based on Fine-Tuned Small Language Models | Oscar G. Lira et.al. | 2512.02861 | null |
| 2025-12-02 | GraphMatch: Fusing Language and Graph Representations in a Dynamic Two-Sided Work Marketplace | Mikołaj Sacha et.al. | 2512.02849 | null |
| 2025-12-02 | Cross-Lingual Prompt Steerability: Towards Accurate and Robust LLM Behavior across Languages | Lechen Zhang et.al. | 2512.02841 | null |
| 2025-12-02 | Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach | Siyuan Yang et.al. | 2512.02834 | null |
| 2025-12-02 | A Comparative Study on How Data Normalization Affects Zero-Shot Generalization in Time Series Foundation Models | Ihab Ahmed et.al. | 2512.02833 | null |
| 2025-12-02 | Phase-Adaptive LLM Framework with Multi-Stage Validation for Construction Robot Task Allocation: A Systematic Benchmark Against Traditional Optimization Algorithms | Shyam prasad reddy Kaitha et.al. | 2512.02810 | null |
| 2025-12-02 | FiMMIA: scaling semantic perturbation-based membership inference across modalities | Anton Emelyanov et.al. | 2512.02786 | null |
| 2025-12-02 | PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models | Robert Belanec et.al. | 2512.02764 | null |
| 2025-12-02 | RoboWheel: A Data Engine from Real-World Human Demonstrations for Cross-Embodiment Robotic Learning | Yuhong Zhang et.al. | 2512.02729 | null |
| 2025-12-02 | AuditCopilot: Leveraging LLMs for Fraud Detection in Double-Entry Bookkeeping | Md Abdul Kadir et.al. | 2512.02726 | null |
| 2025-12-02 | Emergent Bayesian Behaviour and Optimal Cue Combination in LLMs | Julian Ma et.al. | 2512.02719 | null |
| 2025-12-02 | CREST: Universal Safety Guardrails Through Cluster-Guided Cross-Lingual Transfer | Lavish Bansal et.al. | 2512.02711 | null |
| 2025-12-02 | VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm | Zhenkai Wu et.al. | 2512.02700 | null |
| 2025-12-01 | Trinity: Disaggregating Vector Search from Prefill-Decode Disaggregation in LLM Serving | Yi Liu et.al. | 2512.02281 | null |
| 2025-12-01 | Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling | Jack Cook et.al. | 2512.02010 | link |
| 2025-12-01 | The Art of Scaling Test-Time Compute for Large Language Models | Aradhye Agarwal et.al. | 2512.02008 | null |
| 2025-12-01 | Low-Rank Prehab: Preparing Neural Networks for SVD Compression | Haoran Qin et.al. | 2512.01980 | link |
| 2025-12-01 | KV Pareto: Systems-Level Optimization of KV Cache and Model Compression for Long Context Inference | Sai Gokhale et.al. | 2512.01953 | null |
| 2025-12-01 | Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models | Zhongyu Yang et.al. | 2512.01949 | null |
| 2025-12-01 | Agentic Policy Optimization via Instruction-Policy Co-Evolution | Han Zhou et.al. | 2512.01945 | link |
| 2025-12-01 | An Empirical Study of Agent Developer Practices in AI Agent Frameworks | Yanlin Wang et.al. | 2512.01939 | null |
| 2025-12-01 | Med-VCD: Mitigating Hallucination for Medical Large Vision Language Models through Visual Contrastive Decoding | Zahra Mahdavi et.al. | 2512.01922 | null |
| 2025-12-01 | Latent Debate: A Surrogate Framework for Interpreting LLM Thinking | Lihu Chen et.al. | 2512.01909 | null |
| 2025-12-01 | CauSight: Learning to Supersense for Visual Causal Discovery | Yize Zhang et.al. | 2512.01827 | null |
| 2025-12-01 | Generating REST API Tests With Descriptive Names | Philip Garrett et.al. | 2512.01690 | null |
| 2025-12-01 | DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models | Patrick Kwon et.al. | 2512.01686 | null |
| 2025-12-01 | A Systematic Characterization of LLM Inference on GPUs | Haonan Wang et.al. | 2512.01644 | null |
| 2025-12-01 | Agent-Kernel: A MicroKernel Multi-Agent System Framework for Adaptive Social Simulation Powered by LLMs | Yuren Mao et.al. | 2512.01610 | null |
| 2025-12-01 | LLM2Fx-Tools: Tool Calling For Music Post-Production | Seungheon Doh et.al. | 2512.01559 | null |
| 2025-12-01 | LPCD: Unified Framework from Layer-Wise to Submodule Quantization | Yuma Ichikawa et.al. | 2512.01546 | null |
| 2025-12-01 | MCAT: Scaling Many-to-Many Speech-to-Text Translation with MLLMs to 70 Languages | Yexing Du et.al. | 2512.01512 | null |
| 2025-12-01 | Multi-Path Collaborative Reasoning via Reinforcement Learning | Jindi Lv et.al. | 2512.01485 | null |
| 2025-12-01 | ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation | Rohin Manvi et.al. | 2512.01457 | null |
| 2025-12-01 | \textit{ViRectify}: A Challenging Benchmark for Video Reasoning Correction with Multimodal Large Language Models | Xusen Hei et.al. | 2512.01424 | null |
| 2025-11-30 | SIMPLE: Disaggregating Sampling from GPU Inference into a Decision Plane for Faster Distributed LLM Serving | Bohan Zhao et.al. | 2512.00719 | null |
| 2025-11-29 | Efficient Kernel Mapping and Comprehensive System Evaluation of LLM Acceleration on a CGLA | Takuto Ando et.al. | 2512.00335 | null |
| 2025-11-28 | Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction | Bao Shu et.al. | 2511.23476 | null |
| 2025-11-28 | ThetaEvolve: Test-time Learning on Open Problems | Yiping Wang et.al. | 2511.23473 | link |
| 2025-11-28 | Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent | Jianzhe Lin et.al. | 2511.23436 | null |
| 2025-11-28 | Hierarchical AI-Meteorologist: LLM-Agent System for Multi-Scale and Explainable Weather Forecast Reporting | Daniil Sukhorukov et.al. | 2511.23387 | null |
| 2025-11-28 | Do LLM-judges Align with Human Relevance in Cranfield-style Recommender Evaluation? | Gustavo Penha et.al. | 2511.23312 | null |
| 2025-11-28 | MCP vs RAG vs NLWeb vs HTML: A Comparison of the Effectiveness and Efficiency of Different Agent Interfaces to the Web (Technical Report) | Aaron Steiner et.al. | 2511.23281 | null |
| 2025-11-28 | Behavior-Equivalent Token: Single-Token Replacement for Long Prompts in LLMs | Jiancheng Dong et.al. | 2511.23271 | null |
| 2025-11-28 | Unlocking Multilingual Reasoning Capability of LLMs and LVLMs through Representation Engineering | Qiming Li et.al. | 2511.23231 | null |
| 2025-11-28 | Instruction Tuning of Large Language Models for Tabular Data Generation-in One Day | Milad Abdollahzadeh et.al. | 2511.23220 | null |
| 2025-11-28 | Obstruction reasoning for robotic grasping | Runyu Jiao et.al. | 2511.23186 | null |
| 2025-11-28 | HPSU: A Benchmark for Human-Level Perception in Real-World Spoken Speech Understanding | Chen Li et.al. | 2511.23178 | null |
| 2025-11-28 | Multi-chain Graph Refinement and Selection for Reliable Reasoning in Large Language Models | Yujiao Yang et.al. | 2511.23136 | null |
| 2025-11-28 | Evolutionary Discovery of Heuristic Policies for Traffic Signal Control | Ruibing Wang et.al. | 2511.23122 | null |
| 2025-11-28 | Dripper: Token-Efficient Main HTML Extraction with a Lightweight LM | Mengjie Liu et.al. | 2511.23119 | null |
| 2025-11-28 | Conveying Imagistic Thinking in TCM Translation: A Prompt Engineering and LLM-Based Evaluation Framework | Jiatong Han et.al. | 2511.23059 | null |
| 2025-11-28 | Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match | Jinze Li et.al. | 2511.22972 | null |
| 2025-11-28 | Experts are all you need: A Composable Framework for Large Language Model Inference | Shrihari Sridharan et.al. | 2511.22955 | null |
| 2025-11-28 | Visual Puns from Idioms: An Iterative LLM-T2IM-MLLM Framework | Kelaiti Xiao et.al. | 2511.22943 | null |
| 2025-11-28 | RAG-Empowered LLM-Driven Dynamic Radio Resource Management in Open 6G RAN | Onur Salan et.al. | 2511.22933 | null |
| 2025-11-28 | AgentShield: Make MAS more secure and efficient | Kaixiang Wang et.al. | 2511.22924 | null |
| 2025-11-28 | Serving Heterogeneous LoRA Adapters in Distributed LLM Inference Systems | Shashwat Jaiswal et.al. | 2511.22880 | null |
| 2025-11-27 | PRISM: Privacy-Aware Routing for Adaptive Cloud-Edge LLM Inference via Semantic Sketch Collaboration | Junfei Zhan et.al. | 2511.22788 | null |
| 2025-11-26 | Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework | Dong Wang et.al. | 2511.21686 | null |
| 2025-11-26 | DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving | Fengze Yu et.al. | 2511.21669 | null |
| 2025-11-26 | TAGFN: A Text-Attributed Graph Dataset for Fake News Detection in the Age of LLMs | Kay Liu et.al. | 2511.21624 | null |
| 2025-11-26 | Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining | Dongyang Fan et.al. | 2511.21613 | null |
| 2025-11-26 | Auxiliary Metrics Help Decoding Skill Neurons in the Wild | Yixiu Zhao et.al. | 2511.21610 | null |
| 2025-11-26 | SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition | Peiran Xu et.al. | 2511.21471 | null |
| 2025-11-26 | MADRA: Multi-Agent Debate for Risk-Aware Embodied Planning | Junjian Wang et.al. | 2511.21460 | null |
| 2025-11-26 | A Systematic Study of Model Merging Techniques in Large Language Models | Oğuz Kağan Hitit et.al. | 2511.21437 | null |
| 2025-11-26 | Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLM | Tim Trappen et.al. | 2511.21413 | null |
| 2025-11-26 | Prune4Web: DOM Tree Pruning Programming for Web Agent | Jiayuan Zhang et.al. | 2511.21398 | null |
| 2025-11-26 | PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark | Robert Belanec et.al. | 2511.21285 | null |
| 2025-11-26 | Multi-Reward GRPO for Stable and Prosodic Single-Codebook TTS LLMs at Scale | Yicheng Zhong et.al. | 2511.21270 | null |
| 2025-11-26 | Can Finetuing LLMs on Small Human Samples Increase Heterogeneity, Alignment, and Belief-Action Coherence? | Steven Wang et.al. | 2511.21218 | null |
| 2025-11-26 | Progress by Pieces: Test-Time Scaling for Autoregressive Image Generation | Joonhyung Park et.al. | 2511.21185 | null |
| 2025-11-26 | How to Correctly Report LLM-as-a-Judge Evaluations | Chungpa Lee et.al. | 2511.21140 | null |
| 2025-11-26 | Beyond Patch Aggregation: 3-Pass Pyramid Indexing for Vision-Enhanced Document Retrieval | Anup Roy et.al. | 2511.21121 | null |
| 2025-11-26 | BRIDGE: Building Representations In Domain Guided Program Verification | Robert Joseph George et.al. | 2511.21104 | null |
| 2025-11-26 | MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts | Ivan Novikov et.al. | 2511.21089 | null |
| 2025-11-26 | 5G Network Automation Using Local Large Language Models and Retrieval-Augmented Generation | Ahmadreza Majlesara et.al. | 2511.21084 | null |
| 2025-11-26 | Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning | Zhenchao Tang et.al. | 2511.21075 | null |
| 2025-11-25 | LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight | Yunze Man et.al. | 2511.20648 | null |
| 2025-11-25 | Latent Collaboration in Multi-Agent Systems | Jiaru Zou et.al. | 2511.20639 | link |
| 2025-11-25 | ROOT: Robust Orthogonalized Optimizer for Neural Network Training | Wei He et.al. | 2511.20626 | null |
| 2025-11-25 | Copyright Detection in Large Language Models: An Ethical Approach to Generative AI Development | David Szczecina et.al. | 2511.20623 | null |
| 2025-11-25 | DiFR: Inference Verification Despite Nondeterminism | Adam Karvonen et.al. | 2511.20621 | null |
| 2025-11-25 | Translating Large-Scale C Repositories to Idiomatic Rust | Saman Dehghan et.al. | 2511.20617 | null |
| 2025-11-25 | Beyond Generation: Multi-Hop Reasoning for Factual Accuracy in Vision-Language Models | Shamima Hossain et.al. | 2511.20531 | null |
| 2025-11-25 | Assessing LLMs’ Performance: Insights from the Chinese Pharmacist Exam | Xinran Wang et.al. | 2511.20526 | null |
| 2025-11-25 | HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation | Xiang Wang et.al. | 2511.20520 | null |
| 2025-11-25 | Soft Adaptive Policy Optimization | Chang Gao et.al. | 2511.20347 | null |
| 2025-11-25 | The Curious Case of Analogies: Investigating Analogical Reasoning in Large Language Models | Taewhoo Lee et.al. | 2511.20344 | null |
| 2025-11-25 | Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios | Luohe Shi et.al. | 2511.20340 | null |
| 2025-11-25 | Improving Language Agents through BREW | Shashank Kirtania et.al. | 2511.20297 | null |
| 2025-11-25 | APT-CGLP: Advanced Persistent Threat Hunting via Contrastive Graph-Language Pre-Training | Xuebo Qiu et.al. | 2511.20290 | null |
| 2025-11-25 | SMoG: Schema Matching on Graph | Mingyu Jeon et.al. | 2511.20285 | null |
| 2025-11-25 | Bootstrapping Physics-Grounded Video Generation through VLM-Guided Iterative Self-Refinement | Yang Liu et.al. | 2511.20280 | null |
| 2025-11-25 | HVAdam: A Full-Dimension Adaptive Optimizer | Yiheng Zhang et.al. | 2511.20277 | null |
| 2025-11-25 | LLM-Driven Transient Stability Assessment: From Automated Simulation to Neural Architecture Design | Lianzhe Hu et.al. | 2511.20276 | null |
| 2025-11-25 | Rectified Flow for Vision-Aided mmWave V2I Beam Prediction | Can Zheng et.al. | 2511.20265 | null |
| 2025-11-25 | REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance | Chuyi Kong et.al. | 2511.20233 | null |
| 2025-11-24 | Be My Eyes: Extending Large Language Models to New Modalities Through Multi-Agent Collaboration | James Y. Huang et.al. | 2511.19417 | null |
| 2025-11-24 | Syn-GRPO: Self-Evolving Data Synthesis for MLLM Perception Reasoning | Qihan Huang et.al. | 2511.19343 | link |
| 2025-11-24 | Learning to Reason: Training LLMs with GPT-OSS or DeepSeek R1 Reasoning Traces | Shaltiel Shmidman et.al. | 2511.19333 | null |
| 2025-11-24 | MAESTRO: Multi-Agent Environment Shaping through Task and Reward Optimization | Boyuan Wu et.al. | 2511.19253 | null |
| 2025-11-24 | Learning Plug-and-play Memory for Guiding Video Diffusion Models | Selena Song et.al. | 2511.19229 | link |
| 2025-11-24 | Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization | Xurui Li et.al. | 2511.19218 | null |
| 2025-11-24 | From Pixels to Posts: Retrieval-Augmented Fashion Captioning and Hashtag Generation | Moazzam Umer Gondal et.al. | 2511.19149 | null |
| 2025-11-24 | LLMs-Powered Real-Time Fault Injection: An Approach Toward Intelligent Fault Test Cases Generation | Mohammad Abboush et.al. | 2511.19132 | null |
| 2025-11-24 | Facilitating the Integration of LLMs Into Online Experiments With Simple Chat | R. Bermudez Schettino et.al. | 2511.19123 | null |
| 2025-11-24 | MonoSR: Open-Vocabulary Spatial Reasoning from Monocular Images | Qirui Wang et.al. | 2511.19119 | null |
| 2025-11-24 | Large Language Model-Assisted Planning of Electric Vehicle Charging Infrastructure with Real-World Case Study | Xinda Zheng et.al. | 2511.19055 | null |
| 2025-11-24 | FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning | Xin Yuan et.al. | 2511.18977 | null |
| 2025-11-24 | SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression | Santhosh G S et.al. | 2511.18936 | null |
| 2025-11-24 | Skeletons Matter: Dynamic Data Augmentation for Text-to-Query | Yuchen Ji et.al. | 2511.18934 | null |
| 2025-11-24 | Defending Large Language Models Against Jailbreak Exploits with Responsible AI Considerations | Ryan Wong et.al. | 2511.18933 | null |
| 2025-11-24 | FineXtrol: Controllable Motion Generation via Fine-Grained Text | Keming Shen et.al. | 2511.18927 | null |
| 2025-11-24 | BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models | Juncheng Li et.al. | 2511.18921 | null |
| 2025-11-24 | EventSTU: Event-Guided Efficient Spatio-Temporal Understanding for Video Large Language Models | Wenhao Xu et.al. | 2511.18920 | null |
| 2025-11-24 | Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference | Wengyi Zhan et.al. | 2511.18875 | null |
| 2025-11-24 | KernelBand: Boosting LLM-based Kernel Optimization with a Hierarchical and Hardware-aware Multi-armed Bandit | Dezhi Ran et.al. | 2511.18868 | null |
| 2025-11-21 | Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models | Mark Endo et.al. | 2511.17487 | link |
| 2025-11-21 | SPEAR-1: Scaling Beyond Robot Demonstrations via 3D Understanding | Nikolay Nikolov et.al. | 2511.17411 | null |
| 2025-11-21 | That’s not natural: The Impact of Off-Policy Training Data on Probe Performance | Nathalie Kirch et.al. | 2511.17408 | null |
| 2025-11-21 | Beyond Multiple Choice: A Hybrid Framework for Unifying Robust Evaluation and Verifiable Reasoning Training | Yesheng Liu et.al. | 2511.17405 | null |
| 2025-11-21 | SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion | Jiajie Guo et.al. | 2511.17308 | null |
| 2025-11-21 | SlsReuse: LLM-Powered Serverless Function Reuse | Jinfeng Wen et.al. | 2511.17262 | null |
| 2025-11-21 | A Little More Like This: Text-to-Image Retrieval with Vision-Language Models Using Relevance Feedback | Bulat Khaertdinov et.al. | 2511.17255 | null |
| 2025-11-21 | E $^3$ -Pruner: Towards Efficient, Economical, and Effective Layer Pruning for Large Language Models | Tao Yuan et.al. | 2511.17205 | null |
| 2025-11-21 | AutoLink: Autonomous Schema Exploration and Expansion for Scalable Schema Linking in Text-to-SQL at Scale | Ziyang Wang et.al. | 2511.17190 | null |
| 2025-11-21 | Efficient Robot Design with Multi-Objective Black-Box Optimization and Large Language Models | Kento Kawaharazuka et.al. | 2511.17178 | null |
| 2025-11-21 | FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle | Mario Markov et.al. | 2511.17171 | null |
| 2025-11-21 | Hallucinate Less by Thinking More: Aspect-Based Causal Abstention for Large Language Models | Vy Nguyen et.al. | 2511.17170 | null |
| 2025-11-21 | Learning to Compress: Unlocking the Potential of Large Language Models for Text Representation | Yeqin Zhang et.al. | 2511.17129 | null |
| 2025-11-21 | ChainV: Atomic Visual Hints Make Multimodal Reasoning Shorter and Better | Yuan Zhang et.al. | 2511.17106 | null |
| 2025-11-21 | Sparse Reasoning is Enough: Biological-Inspired Framework for Video Anomaly Detection with Large Pre-trained Models | He Huang et.al. | 2511.17094 | null |
| 2025-11-21 | MUCH: A Multilingual Claim Hallucination Benchmark | Jérémie Dentan et.al. | 2511.17081 | null |
| 2025-11-21 | Principled Design of Interpretable Automated Scoring for Large-Scale Educational Assessments | Yunsung Kim et.al. | 2511.17069 | null |
| 2025-11-21 | Parametric Retrieval-Augmented Generation using Latent Routing of LoRA Adapters | Zhan Su et.al. | 2511.17044 | null |
| 2025-11-21 | CLLMRec: LLM-powered Cognitive-Aware Concept Recommendation via Semantic Alignment and Prerequisite Knowledge Distillation | Xiangrui Xiong et.al. | 2511.17041 | null |
| 2025-11-21 | FIRM: Federated In-client Regularized Multi-objective Alignment for Large Language Models | Fatemeh et.al. | 2511.16992 | null |
| 2025-11-20 | Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter | Qinghao Hu et.al. | 2511.16665 | null |
| 2025-11-20 | Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs | Ali Taghibakhshi et.al. | 2511.16664 | null |
| 2025-11-20 | Comparison of Text-Based and Image-Based Retrieval in Multimodal Retrieval Augmented Generation Large Language Model Systems | Elias Lumer et.al. | 2511.16654 | null |
| 2025-11-20 | You Only Forward Once: An Efficient Compositional Judging Paradigm | Tianlong Zhang et.al. | 2511.16600 | null |
| 2025-11-20 | TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding | Boshen Xu et.al. | 2511.16595 | null |
| 2025-11-20 | Integrating Symbolic Natural Language Understanding and Language Models for Word Sense Disambiguation | Kexin Zhao et.al. | 2511.16577 | null |
| 2025-11-20 | Utilizing Large Language Models for Zero-Shot Medical Ontology Extension from Clinical Notes | Guanchen Wu et.al. | 2511.16548 | null |
| 2025-11-20 | The Oracle and The Prism: A Decoupled and Efficient Framework for Generative Recommendation Explanation | Jiaheng Zhang et.al. | 2511.16543 | null |
| 2025-11-20 | Beyond Tokens in Language Models: Interpreting Activations through Text Genre Chunks | Éloïse Benito-Rodriguez et.al. | 2511.16540 | null |
| 2025-11-20 | LLM4EO: Large Language Model for Evolutionary Optimization in Flexible Job Shop Scheduling | Rongjie Liao et.al. | 2511.16485 | null |
| 2025-11-20 | Optimizing Federated Learning in the Era of LLMs: Message Quantization and Streaming | Ziyue Xu et.al. | 2511.16450 | null |
| 2025-11-20 | An Efficient LLM-based Evolutional Recommendation with Locate-Forget-Update Paradigm | Hao Liu et.al. | 2511.16414 | null |
| 2025-11-20 | CorrectHDL: Agentic HDL Design with LLMs Leveraging High-Level Synthesis as Reference | Kangwei Xu et.al. | 2511.16395 | null |
| 2025-11-20 | Incorporating Self-Rewriting into Large Language Model Reasoning Reinforcement | Jiashu Yao et.al. | 2511.16331 | null |
| 2025-11-20 | ARK: Answer-Centric Retriever Tuning via KG-augmented Curriculum Learning | Jiawei Zhou et.al. | 2511.16326 | null |
| 2025-11-20 | SDA: Steering-Driven Distribution Alignment for Open LLMs without Fine-Tuning | Wei Xia et.al. | 2511.16324 | null |
| 2025-11-20 | “To Survive, I Must Defect”: Jailbreaking LLMs via the Game-Theory Scenarios | Zhen Sun et.al. | 2511.16278 | null |
| 2025-11-20 | Pass@k Metric for RLVR: A Diagnostic Tool of Exploration, But Not an Objective | Yang Yu et.al. | 2511.16231 | null |
| 2025-11-20 | Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security | Wei Zhao et.al. | 2511.16229 | null |
| 2025-11-20 | Beyond Code Similarity: Benchmarking the Plausibility, Efficiency, and Complexity of LLM-Generated Smart Contracts | Francesco Salzano et.al. | 2511.16224 | null |
| 2025-11-19 | MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping | Yushi Huang et.al. | 2511.15690 | null |
| 2025-11-19 | DuoZone: A User-Centric, LLM-Guided Mixed-Initiative XR Window Management System | Jing Qian et.al. | 2511.15676 | null |
| 2025-11-19 | Quantum-Guided Test Case Minimization for LLM-Based Code Generation | Huixiang Zhang et.al. | 2511.15665 | null |
| 2025-11-19 | HSKBenchmark: Modeling and Benchmarking Chinese Second Language Acquisition in Large Language Models through Curriculum Tuning | Qihao Yang et.al. | 2511.15574 | null |
| 2025-11-19 | A Tensor Compiler for Processing-In-Memory Architectures | Peiming Yang et.al. | 2511.15503 | null |
| 2025-11-19 | Insights from the ICLR Peer Review and Rebuttal Process | Amir Hossein Kargaran et.al. | 2511.15462 | null |
| 2025-11-19 | Know Your Intent: An Autonomous Multi-Perspective LLM Agent Framework for DeFi User Transaction Intent Mining | Qian’ang Mao et.al. | 2511.15456 | null |
| 2025-11-19 | CroPS: Improving Dense Retrieval with Cross-Perspective Positive Samples in Short-Video Search | Ao Xie et.al. | 2511.15443 | null |
| 2025-11-19 | Small Language Models for Phishing Website Detection: Cost, Performance, and Privacy Trade-Offs | Georg Goldenits et.al. | 2511.15434 | null |
| 2025-11-19 | DEPO: Dual-Efficiency Preference Optimization for LLM Agents | Sirui Chen et.al. | 2511.15392 | null |
| 2025-11-19 | Unveiling Inference Scaling for Difference-Aware User Modeling in LLM Personalization | Suyu Chen et.al. | 2511.15389 | null |
| 2025-11-19 | A Compliance-Preserving Retrieval System for Aircraft MRO Task Search | Byungho Jo et.al. | 2511.15383 | null |
| 2025-11-19 | HEAD-QA v2: Expanding a Healthcare Benchmark for Reasoning | Alexis Correa-Guillén et.al. | 2511.15355 | null |
| 2025-11-19 | Reflexive Evidence-Based Multimodal Learning for Clean Energy Transitions: Causal Insights on Cooking Fuel Access, Urbanization, and Carbon Emissions | Shan Shan et.al. | 2511.15342 | null |
| 2025-11-19 | What Your Features Reveal: Data-Efficient Black-Box Feature Inversion Attack for Split DNNs | Zhihan Ren et.al. | 2511.15316 | null |
| 2025-11-19 | EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control | Kai Yang et.al. | 2511.15248 | null |
| 2025-11-19 | OEMA: Ontology-Enhanced Multi-Agent Collaboration Framework for Zero-Shot Clinical Named Entity Recognition | Xinli Tao et.al. | 2511.15211 | null |
| 2025-11-19 | As If We’ve Met Before: LLMs Exhibit Certainty in Recognizing Seen Files | Haodong Li et.al. | 2511.15192 | null |
| 2025-11-19 | A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models | Duo Li et.al. | 2511.15098 | null |
| 2025-11-19 | Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference | Kexin Chu et.al. | 2511.15015 | null |
| 2025-11-18 | Natural Language Interfaces for Databases: What Do Users Think? | Panos Ipeirotis et.al. | 2511.14718 | null |
| 2025-11-18 | Strategic Innovation Management in the Age of Large Language Models Market Intelligence, Adaptive R&D, and Ethical Governance | Raha Aghaei et.al. | 2511.14709 | null |
| 2025-11-18 | Near-Lossless Model Compression Enables Longer Context Inference in DNA Large Language Models | Rui Zhu et.al. | 2511.14694 | link |
| 2025-11-18 | Attention via Synaptic Plasticity is All You Need: A Biologically Inspired Spiking Neuromorphic Transformer | Kallol Mondal et.al. | 2511.14691 | null |
| 2025-11-18 | SkillGen: Learning Domain Skills for In-Context Sequential Decision Making | Ruomeng Ding et.al. | 2511.14670 | null |
| 2025-11-18 | Bias in, Bias out: Annotation Bias in Multilingual Large Language Models | Xia Cui et.al. | 2511.14662 | null |
| 2025-11-18 | AutoTool: Efficient Tool Selection for Large Language Model Agents | Jingyi Jia et.al. | 2511.14650 | null |
| 2025-11-18 | Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning | Ruoyu Qin et.al. | 2511.14617 | null |
| 2025-11-18 | A Controllable Perceptual Feature Generative Model for Melody Harmonization via Conditional Variational Autoencoder | Dengyun Huang et.al. | 2511.14600 | null |
| 2025-11-18 | OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models | Keda Tao et.al. | 2511.14582 | null |
| 2025-11-18 | Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language | Minyoung Hwang et.al. | 2511.14565 | null |
| 2025-11-18 | LLM-Assisted Thematic Analysis: Opportunities, Limitations, and Recommendations | Tatiane Ornelas et.al. | 2511.14528 | null |
| 2025-11-18 | CLO: Efficient LLM Inference System with CPU-Light KVCache Offloading via Algorithm-System Co-Design | Jiawei Yi et.al. | 2511.14510 | null |
| 2025-11-18 | Hyperion: Hierarchical Scheduling for Parallel LLM Acceleration in Multi-tier Networks | Mulei Ma et.al. | 2511.14450 | null |
| 2025-11-18 | Watchdogs and Oracles: Runtime Verification Meets Large Language Models for Autonomous Systems | Angelo Ferrando et.al. | 2511.14435 | null |
| 2025-11-18 | When Words Change the Model: Sensitivity of LLMs for Constraint Programming Modelling | Alessio Pellegrino et.al. | 2511.14334 | null |
| 2025-11-18 | PathMind: A Retrieve-Prioritize-Reason Framework for Knowledge Graph Reasoning with Large Language Models | Yu Liu et.al. | 2511.14256 | null |
| 2025-11-18 | Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction Learning | Rui Liu et.al. | 2511.14249 | null |
| 2025-11-18 | N-GLARE: An Non-Generative Latent Representation-Efficient LLM Safety Evaluator | Zheyu Lin et.al. | 2511.14195 | null |
| 2025-11-18 | AdaTok: Adaptive Token Compression with Object-Aware Representations for Efficient Multimodal LLMs | Xinliang Zhang et.al. | 2511.14169 | null |
| 2025-11-17 | TZ-LLM: Protecting On-Device Large Language Models with Arm TrustZone | Xunjie Wang et.al. | 2511.13717 | null |
| 2025-11-17 | Generalist Foundation Models Are Not Clinical Enough for Hospital Operations | Lavender Y. Jiang et.al. | 2511.13703 | null |
| 2025-11-17 | T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization | Hyunwoo Oh et.al. | 2511.13676 | null |
| 2025-11-17 | Part-X-MLLM: Part-aware 3D Multimodal Large Language Model | Chunshi Wang et.al. | 2511.13647 | link |
| 2025-11-17 | Data Value in the Age of Scaling: Understanding LLM Scaling Dynamics Under Real-Synthetic Data Mixtures | Haohui Wang et.al. | 2511.13640 | null |
| 2025-11-17 | CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product | Kaiwen Xue et.al. | 2511.13626 | null |
| 2025-11-17 | P1: Mastering Physics Olympiads with Reinforcement Learning | Jiacheng Chen et.al. | 2511.13612 | null |
| 2025-11-17 | Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents | Piaohong Wang et.al. | 2511.13593 | null |
| 2025-11-17 | Automated Construction of Medical Indicator Knowledge Graphs Using Retrieval Augmented Large Language Models | Zhengda Wang et.al. | 2511.13526 | null |
| 2025-11-17 | FreeAskWorld: An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI | Yuhang Peng et.al. | 2511.13524 | null |
| 2025-11-17 | Tight and Practical Privacy Auditing for Differentially Private In-Context Learning | Yuyang Xia et.al. | 2511.13502 | null |
| 2025-11-17 | Multi-Agent Multimodal Large Language Model Framework for Automated Interpretation of Fuel Efficiency Analytics in Public Transportation | Zhipeng Ma et.al. | 2511.13476 | null |
| 2025-11-17 | Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline | Rui Zuo et.al. | 2511.13442 | null |
| 2025-11-17 | Mem-PAL: Towards Memory-based Personalized Dialogue Assistants for Long-term User-Agent Interaction | Zhaopei Huang et.al. | 2511.13410 | null |
| 2025-11-17 | A Novel Hierarchical Integration Method for Efficient Model Merging in Medical LLMs | Prakrit Timilsina et.al. | 2511.13373 | null |
| 2025-11-17 | Cognitive Maps in Language Models: A Mechanistic Analysis of Spatial Planning | Caroline Baumgartner et.al. | 2511.13371 | null |
| 2025-11-17 | FLOWER: Flow-Oriented Entity-Relationship Tool | Dmitry Moskalev et.al. | 2511.13357 | null |
| 2025-11-17 | An LLM-based Quantitative Framework for Evaluating High-Stealthy Backdoor Risks in OSS Supply Chains | Zihe Yan et.al. | 2511.13341 | null |
| 2025-11-17 | ZeroDexGrasp: Zero-Shot Task-Oriented Dexterous Grasp Synthesis with Prompt-Based Multi-Stage Semantic Reasoning | Juntao Jian et.al. | 2511.13327 | null |
| 2025-11-17 | Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment | Jea Kwon et.al. | 2511.13290 | null |
| 2025-11-14 | Optimizing Mixture of Block Attention | Guangxuan Xiao et.al. | 2511.11571 | null |
| 2025-11-14 | Experience-Guided Adaptation of Inference-Time Reasoning Strategies | Adam Stein et.al. | 2511.11519 | null |
| 2025-11-14 | W2S-AlignTree: Weak-to-Strong Inference-Time Alignment for Large Language Models via Monte Carlo Tree Search | Zhenyu Ding et.al. | 2511.11518 | link |
| 2025-11-14 | PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision–Language Models | Nhat Hoang-Xuan et.al. | 2511.11502 | null |
| 2025-11-14 | Benchmarking Visual LLMs Resilience to Unanswerable Questions on Visually Rich Documents | Davide Napolitano et.al. | 2511.11468 | null |
| 2025-11-14 | CURENet: Combining Unified Representations for Efficient Chronic Disease Prediction | Cong-Tinh Dao et.al. | 2511.11423 | null |
| 2025-11-14 | SCRUTINEER: Detecting Logic-Level Usage Violations of Reusable Components in Smart Contracts | Xingshuang Lin et.al. | 2511.11411 | null |
| 2025-11-14 | MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism | Shulin Liu et.al. | 2511.11373 | null |
| 2025-11-14 | SEAL: Subspace-Anchored Watermarks for LLM Ownership | Yanbo Dai et.al. | 2511.11356 | null |
| 2025-11-14 | UFO $^3$ : Weaving the Digital Agent Galaxy | Chaoyun Zhang et.al. | 2511.11332 | null |
| 2025-11-14 | LAET: A Layer-wise Adaptive Ensemble Tuning Framework for Pretrained Language Models | Jawad Ibn Ahad et.al. | 2511.11315 | null |
| 2025-11-14 | iMAD: Intelligent Multi-Agent Debate for Efficient and Accurate LLM Inference | Wei Fan et.al. | 2511.11306 | null |
| 2025-11-14 | EcoAlign: An Economically Rational Framework for Efficient LVLM Alignment | Ruoxi Cheng et.al. | 2511.11301 | null |
| 2025-11-14 | GraphPilot: Grounded Scene Graph Conditioning for Language-Based Autonomous Driving | Fabian Schmidt et.al. | 2511.11266 | null |
| 2025-11-14 | KGQuest: Template-Driven QA Generation from Knowledge Graphs with LLM-Based Refinement | Sania Nayab et.al. | 2511.11258 | null |
| 2025-11-14 | T-MAN: Enabling End-to-End Low-Bit LLM Inference on NPUs via Unified Table Lookup | Jianyu Wei et.al. | 2511.11248 | null |
| 2025-11-14 | STaR: Towards Cognitive Table Reasoning via Slow-Thinking Large Language Models | Huajian Zhang et.al. | 2511.11233 | null |
| 2025-11-14 | Questioning the Stability of Visual Question Answering | Amir Rosenfeld et.al. | 2511.11206 | null |
| 2025-11-14 | Viper-F1: Fast and Fine-Grained Multimodal Understanding with Cross-Modal State-Space Modulation | Quoc-Huy Trinh et.al. | 2511.11177 | null |
| 2025-11-14 | Explainable Deep Convolutional Multi-Type Anomaly Detection | Alex George et.al. | 2511.11165 | null |
| 2025-11-13 | ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference | Yesheng Liang et.al. | 2511.10645 | null |
| 2025-11-13 | Textual understanding boost in the WikiRace | Raman Ebrahimi et.al. | 2511.10585 | null |
| 2025-11-13 | URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding | Yongxin Shi et.al. | 2511.10552 | link |
| 2025-11-13 | Don’t Waste It: Guiding Generative Recommenders with Structured Human Priors via Multi-head Decoding | Yunkai Zhang et.al. | 2511.10492 | link |
| 2025-11-13 | Scalable Synthesis of distributed LLM workloads through Symbolic Tensor Graphs | Changhai Man et.al. | 2511.10480 | null |
| 2025-11-13 | AgentEvolver: Towards Efficient Self-Evolving Agent System | Yunpeng Zhai et.al. | 2511.10395 | link |
| 2025-11-13 | SITA: A Framework for Structure-to-Instance Theorem Autoformalization | Chenyi Li et.al. | 2511.10356 | null |
| 2025-11-13 | EDGC: Entropy-driven Dynamic Gradient Compression for Efficient LLM Training | Qingao Yi et.al. | 2511.10333 | null |
| 2025-11-13 | Rethinking Visual Information Processing in Multimodal LLMs | Dongwan Kim et.al. | 2511.10301 | null |
| 2025-11-13 | Adaptive Residual-Update Steering for Low-Overhead Hallucination Mitigation in Large Vision Language Models | Zhengtao Zou et.al. | 2511.10292 | null |
| 2025-11-13 | FactGuard: Event-Centric and Commonsense-Guided Fake News Detection | Jing He et.al. | 2511.10281 | null |
| 2025-11-13 | Quality Assurance of LLM-generated Code: Addressing Non-Functional Quality Characteristics | Xin Sun et.al. | 2511.10271 | null |
| 2025-11-13 | LangGPS: Language Separability Guided Data Pre-Selection for Joint Multilingual Instruction Tuning | Yangfan Ye et.al. | 2511.10229 | null |
| 2025-11-13 | Persona-Aware Alignment Framework for Personalized Dialogue Generation | Guanrong Li et.al. | 2511.10215 | null |
| 2025-11-13 | Advanced Black-Box Tuning of Large Language Models with Limited API Calls | Zhikang Xie et.al. | 2511.10210 | null |
| 2025-11-13 | EffiReason-Bench: A Unified Benchmark for Evaluating and Advancing Efficient Reasoning in Large Language Models | Junquan Huang et.al. | 2511.10201 | null |
| 2025-11-13 | Efficient Thought Space Exploration through Strategic Intervention | Ziheng Li et.al. | 2511.10038 | null |
| 2025-11-13 | AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models | Xinyi Wang et.al. | 2511.10017 | null |
| 2025-11-13 | AssertMiner: Module-Level Spec Generation and Assertion Mining using Static Analysis Guided LLMs | Hongqin Lyu et.al. | 2511.10007 | null |
| 2025-11-13 | PustakAI: Curriculum-Aligned and Interactive Textbooks Using Large Language Models | Shivam Sharma et.al. | 2511.10002 | null |
| 2025-11-10 | Hard vs. Noise: Resolving Hard-Noisy Sample Confusion in Recommender Systems via Large Language Models | Tianrui Song et.al. | 2511.07295 | link |
| 2025-11-10 | LLMServingSim2.0: A Unified Simulator for Heterogeneous Hardware and Serving Techniques in LLM Infrastructure | Jaehong Cho et.al. | 2511.07229 | null |
| 2025-11-10 | Importance-Aware Data Selection for Efficient LLM Instruction Tuning | Tingyu Jiang et.al. | 2511.07074 | null |
| 2025-11-10 | GoCkpt: Gradient-Assisted Multi-Step overlapped Checkpointing for Efficient LLM Training | Keyao Zhang et.al. | 2511.07035 | null |
| 2025-11-10 | P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats | Yuzong Chen et.al. | 2511.06838 | null |
| 2025-11-09 | Efficient LLM Safety Evaluation through Multi-Agent Debate | Dachuan Lin et.al. | 2511.06396 | null |
| 2025-11-09 | ELEGANCE: Efficient LLM Guidance for Audio-Visual Target Speech Extraction | Wenxuan Wu et.al. | 2511.06288 | null |
| 2025-11-09 | Optimizing Long-context LLM Serving via Fine-grained Sequence Parallelism | Cong Li et.al. | 2511.06247 | null |
| 2025-11-09 | Confidence-Guided Stepwise Model Routing for Cost-Efficient Reasoning | Sangmook Lee et.al. | 2511.06190 | null |
| 2025-11-09 | LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs | Zifan He et.al. | 2511.06174 | null |
| 2025-11-08 | Lethe: Layer- and Time-Adaptive KV Cache Pruning for Reasoning-Intensive LLM Serving | Hui Zeng et.al. | 2511.06029 | null |
| 2025-11-08 | MoSKA: Mixture of Shared KV Attention for Efficient Long-Sequence LLM Inference | Myunghyun Rhee et.al. | 2511.06010 | null |
| 2025-11-08 | MCP-RiskCue: Can LLM infer risk information from MCP server System Logs? | Jiayi Fu et.al. | 2511.05867 | null |
| 2025-11-05 | From Prompts to Power: Measuring the Energy Footprint of LLM Inference | Francisco Caravaca et.al. | 2511.05597 | null |
| 2025-11-06 | DuetServe: Harmonizing Prefill and Decode for LLM Serving via Adaptive GPU Multiplexing | Lei Gao et.al. | 2511.04791 | null |
| 2025-11-06 | Enabling Dynamic Sparsity in Quantized LLM Inference | Rongxiang Wang et.al. | 2511.04477 | null |
| 2025-11-06 | E-CARE: An Efficient LLM-based Commonsense-Augmented Framework for E-Commerce | Ge Zhang et.al. | 2511.04087 | null |
| 2025-11-06 | PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration | Yue Jiet Chong et.al. | 2511.04036 | null |
| 2025-11-06 | LLM-Driven Adaptive Source-Sink Identification and False Positive Mitigation for Static Analysis | Shiyin Lin et.al. | 2511.04023 | null |
| 2025-11-05 | RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse | Yinsicheng Jiang et.al. | 2511.03475 | null |
| 2025-11-07 | UMDAM: A Unified Data Layout and DRAM Address Mapping for Heterogenous NPU-PIM | Hai Huang et.al. | 2511.03293 | null |
| 2025-11-04 | Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes | Mohammadsajad Alipour et.al. | 2511.02681 | null |
| 2025-11-04 | Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks | Xiumei Deng et.al. | 2511.02647 | null |
| 2025-11-04 | Verifying LLM Inference to Prevent Model Weight Exfiltration | Roy Rinberg et.al. | 2511.02620 | null |
| 2025-11-04 | KV Cache Transform Coding for Compact Storage in LLM Inference | Konrad Staniszewski et.al. | 2511.01815 | null |
| 2025-11-04 | Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding | Jungyeon Koh et.al. | 2511.01695 | null |
| 2025-11-03 | Scaling Graph Chain-of-Thought Reasoning: A Multi-Agent Framework with Efficient LLM Serving | Chengying Huan et.al. | 2511.01633 | null |
| 2025-11-03 | When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding | Min Fang et.al. | 2511.01282 | null |
| 2025-11-04 | CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing | Yifan Zhou et.al. | 2511.01197 | null |
| 2025-11-02 | FlexiCache: Leveraging Temporal Stability of Attention Heads for Efficient KV Cache Management | Nazmul Takbir et.al. | 2511.00868 | null |
| 2025-11-05 | FREESH: Fair, Resource- and Energy-Efficient Scheduling for LLM Serving on Heterogeneous GPUs | Xuan He et.al. | 2511.00807 | null |
| 2025-11-04 | SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding | Jameson Sandler et.al. | 2511.00606 | null |
| 2025-11-01 | FlashEVA: Accelerating LLM inference via Efficient Attention | Juan Gabriel Kostelec et.al. | 2511.00576 | null |
| 2025-11-01 | Proactive DDoS Detection and Mitigation in Decentralized Software-Defined Networking via Port-Level Monitoring and Zero-Training Large Language Models | Mohammed N. Swileh et.al. | 2511.00460 | null |
| 2025-10-31 | Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU Limits | Dowon Kim et.al. | 2511.00321 | null |
| 2025-11-05 | PDE-SHARP: PDE Solver Hybrids through Analysis and Refinement Passes | Shaghayegh Fazliani et.al. | 2511.00183 | null |
| 2025-10-31 | AMD MI300X GPU Performance Analysis | Chandrish Ambati et.al. | 2510.27583 | null |
| 2025-10-31 | Glia: A Human-Inspired AI for Automated Systems Design and Optimization | Pouya Hamadanian et.al. | 2510.27176 | null |
| 2025-10-29 | Category-Aware Semantic Caching for Heterogeneous LLM Workloads | Chen Wang et.al. | 2510.26835 | null |
| 2025-10-30 | Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model | Biao Zhang et.al. | 2510.26622 | null |
| 2025-10-30 | 1+1>2: A Synergistic Sparse and Low-Rank Compression Method for Large Language Models | Zeliang Zong et.al. | 2510.26446 | null |
| 2025-10-30 | Beyond Benchmarks: The Economics of AI Inference | Boqin Zhuang et.al. | 2510.26136 | null |
| 2025-10-31 | AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention Cache | Dinghong Song et.al. | 2510.25979 | link |
| 2025-10-31 | NeuronMM: High-Performance Matrix Multiplication for LLM Inference on AWS Trainium | Dinghong Song et.al. | 2510.25977 | null |
| 2025-10-29 | A Survey on Efficient Large Language Model Training: From Data-centric Perspectives | Junyu Luo et.al. | 2510.25817 | null |
| 2025-10-29 | Serve Programs, Not Prompts | In Gim et.al. | 2510.25412 | null |
| 2025-10-29 | GPTOpt: Towards Efficient LLM-Based Black-Box Optimization | Jamison Meindl et.al. | 2510.25404 | null |
| 2025-10-29 | OpenReward: Learning to Reward Long-form Agentic Tasks via Reinforcement Learning | Ziyou Hu et.al. | 2510.24636 | null |
| 2025-10-28 | Pie: A Programmable Serving System for Emerging LLM Applications | In Gim et.al. | 2510.24051 | null |
| 2025-10-28 | Resource-Efficient LLM Application for Structured Transformation of Unstructured Financial Contracts | Maruf Ahmed Mridul et.al. | 2510.23990 | null |
| 2025-10-26 | Batch Speculative Decoding Done Right | Ranran Haoran Zhang et.al. | 2510.22876 | null |
| 2025-10-26 | TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination | Omar Naim et.al. | 2510.22767 | null |
| 2025-10-26 | Do Stop Me Now: Detecting Boilerplate Responses with a Single Iteration | Yuval Kainan et.al. | 2510.22679 | null |
| 2025-10-26 | SABlock: Semantic-Aware KV Cache Eviction with Adaptive Compression Block Size | Jinhan Chen et.al. | 2510.22556 | null |
| 2025-10-23 | Compress to Impress: Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples | Shiva Sreeram et.al. | 2510.20800 | null |
| 2025-10-23 | RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging | Bowen Wang et.al. | 2510.20479 | null |
| 2025-10-22 | Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs | Hongyi Liu et.al. | 2510.20064 | null |
| 2025-10-22 | AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders | Yuezhou Hu et.al. | 2510.19779 | null |
| 2025-10-22 | Are Large Language Models Sensitive to the Motives Behind Communication? | Addison J. Wu et.al. | 2510.19687 | null |
| 2025-10-22 | DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference | Xiang Liu et.al. | 2510.19669 | null |
| 2025-10-22 | Energy-Efficient and Dequantization-Free Q-LLMs: A Spiking Neural Network Approach to Salient Value Mitigation | Chenyu Wang et.al. | 2510.19498 | null |
| 2025-10-21 | EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval | Zebin Yang et.al. | 2510.18546 | null |
| 2025-10-21 | SLICE: SLO-Driven Scheduling for LLM Inference on Edge Computing Devices | Pan Zhou et.al. | 2510.18544 | null |
| 2025-10-21 | Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs | Song Bian et.al. | 2510.18245 | null |
| 2025-10-20 | Planned Diffusion | Daniel Israel et.al. | 2510.18087 | null |
| 2025-10-20 | Language Models as Semantic Augmenters for Sequential Recommenders | Mahsa Valizadeh et.al. | 2510.18046 | null |
| 2025-10-19 | Justitia: Fair and Efficient Scheduling for LLM Applications | Mingyan Yang et.al. | 2510.17015 | null |
| 2025-10-18 | FourierCompress: Layer-Aware Spectral Activation Compression for Efficient and Accurate Collaborative LLM Inference | Jian Ma et.al. | 2510.16418 | null |
| 2025-10-16 | AMS-QUANT: Adaptive Mantissa Sharing for Floating-point Quantization | Mengtao Lv et.al. | 2510.16045 | null |
| 2025-10-16 | Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing | Tianhua Xia et.al. | 2510.16040 | null |
| 2025-10-17 | TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs | Sibo Xiao et.al. | 2510.15545 | null |
| 2025-10-16 | Tail-Optimized Caching for LLM Inference | Wenxin Zhang et.al. | 2510.15152 | null |
| 2025-10-16 | Identity-Link IRT for Label-Free LLM Evaluation: Preserving Additivity in TVD-MI Scores | Zachary Robertson et.al. | 2510.14966 | null |
| 2025-10-16 | xLLM Technical Report | Tongxuan Liu et.al. | 2510.14686 | null |
| 2025-10-16 | MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving | Jungi Lee et.al. | 2510.14557 | null |
| 2025-10-16 | FairBatching: Fairness-Aware Batch Formation for LLM Inference | Hongtao Lyu et.al. | 2510.14392 | null |
| 2025-10-16 | Qwen3Guard Technical Report | Haiquan Zhao et.al. | 2510.14276 | null |
| 2025-10-15 | Efficiently Executing High-throughput Lightweight LLM Inference Applications on Heterogeneous Opportunistic GPU Clusters with Pervasive Context Management | Thanh Son Phung et.al. | 2510.14024 | null |
| 2025-10-15 | Adaptive Rescheduling in Prefill-Decode Disaggregated LLM Inference | Zhibin Wang et.al. | 2510.13668 | null |
| 2025-10-15 | F-BFQ: Flexible Block Floating-Point Quantization Accelerator for LLMs | Jude Haris et.al. | 2510.13401 | null |
| 2025-10-15 | Taming the Fragility of KV Cache Eviction in LLM Inference | Yuan Feng et.al. | 2510.13334 | null |
| 2025-10-15 | BanaServe: Unified KV Cache and Dynamic Module Migration for Balancing Disaggregated LLM Serving in AI Infrastructure | Yiyuan He et.al. | 2510.13223 | null |
| 2025-10-15 | Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference | Nikhil Bhendawade et.al. | 2510.13161 | null |
| 2025-10-21 | Retrieval-in-the-Chain: Bootstrapping Large Language Models for Generative Retrieval | Yingchen Zhang et.al. | 2510.13095 | null |
| 2025-10-14 | On the Role of Preference Variance in Preference Optimization | Jiacheng Guo et.al. | 2510.13022 | null |
| 2025-10-14 | KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems | Hancheng Ye et.al. | 2510.12872 | null |
| 2025-10-14 | Beyond Postconditions: Can Large Language Models infer Formal Contracts for Automatic Software Verification? | Cedric Richter et.al. | 2510.12702 | null |
| 2025-10-14 | Traveling Salesman-Based Token Ordering Improves Stability in Homomorphically Encrypted Language Models | Donghwan Rho et.al. | 2510.12343 | null |
| 2025-10-13 | FlexPipe: Adapting Dynamic LLM Serving Through Inflight Pipeline Refactoring in Fragmented Serverless Clusters | Yanying Lin et.al. | 2510.11938 | null |
| 2025-10-13 | Efficient LLM Inference over Heterogeneous Edge Networks with Speculative Decoding | Bingjie Zhu et.al. | 2510.11331 | null |
| 2025-10-13 | An Explorative Study on Distributed Computing Techniques in Training and Inference of Large Language Models | Sheikh Azizul Hakim et.al. | 2510.11211 | null |
| 2025-10-13 | Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs | João Paulo Cardoso de Lima et.al. | 2510.11192 | null |
| 2025-10-12 | Hierarchical Optimization via LLM-Guided Objective Evolution for Mobility-on-Demand Systems | Yi Zhang et.al. | 2510.10644 | null |
| 2025-10-11 | MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation | Wentian Zhu et.al. | 2510.10271 | null |
| 2025-10-11 | CacheClip: Accelerating RAG with Effective KV Cache Reuse | Bin Yang et.al. | 2510.10129 | null |
| 2025-10-11 | Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM-Enhanced Optimization | Yang Li et.al. | 2510.10028 | null |
| 2025-10-10 | Evaluating LLM-Based Process Explanations under Progressive Behavioral-Input Reduction | P. van Oerle et.al. | 2510.09732 | null |
| 2025-10-10 | Layout-Aware Parsing Meets Efficient LLMs: A Unified, Scalable Framework for Resume Information Extraction and Evaluation | Fanwei Zhu et.al. | 2510.09722 | null |
| 2025-10-10 | FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference | Yu-Chen Lu et.al. | 2510.09332 | null |
| 2025-10-10 | Semantic-Condition Tuning: Fusing Graph Context with Large Language Models for Knowledge Graph Completion | Ruitong Liu et.al. | 2510.08966 | null |
| 2025-10-13 | Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors | Xin Liu et.al. | 2510.08907 | null |
| 2025-10-10 | Mozart: A Chiplet Ecosystem-Accelerator Codesign Framework for Composable Bespoke Application Specific Integrated Circuits | Haoran Jin et.al. | 2510.08873 | null |
| 2025-10-09 | When to Reason: Semantic Router for vLLM | Chen Wang et.al. | 2510.08731 | null |
| 2025-10-09 | SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference | Hengrui Zhang et.al. | 2510.08544 | null |
| 2025-10-09 | From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill | Gunjun Lee et.al. | 2510.08055 | null |
| 2025-10-09 | Augur: Modeling Covariate Causal Associations in Time Series via Large Language Models | Zhiqing Cui et.al. | 2510.07858 | null |
| 2025-10-09 | OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference | Yuzhe Gu et.al. | 2510.07651 | null |
| 2025-10-08 | AsyncSpade: Efficient Test-Time Scaling with Asynchronous Sparse Decoding | Shuqing Luo et.al. | 2510.07486 | null |
| 2025-10-08 | Accelerating Diffusion LLM Inference via Local Determinism Propagation | Fanheng Kong et.al. | 2510.07081 | null |
| 2025-10-08 | Accelerating Sparse Ternary GEMM for Quantized LLM inference on Apple Silicon | Baraq Lipshitz et.al. | 2510.06957 | null |
| 2025-10-08 | PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMs | Manuel Frank et.al. | 2510.06730 | null |
| 2025-10-07 | VecInfer: Efficient LLM Inference with Low-Bit KV Cache via Outlier-Suppressed Vector Quantization | Dingyu Yao et.al. | 2510.06175 | null |
| 2025-10-07 | lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models | Haoxin Wang et.al. | 2510.06126 | null |
| 2025-10-07 | From Principles to Practice: A Systematic Study of LLM Serving on Multi-core NPUs | Tianhao Zhu et.al. | 2510.05632 | null |
| 2025-10-07 | Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM | Ryan Solgi et.al. | 2510.05544 | null |
| 2025-10-07 | H1B-KV: Hybrid One-Bit Caches for Memory-Efficient Large Language Model Inference | Harshil Vejendla et.al. | 2510.05529 | null |
| 2025-10-07 | Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting | Zhongkai Yu et.al. | 2510.05497 | null |
| 2025-10-06 | KVLinC : KV Cache Quantization with Hadamard Rotation and Linear Correction | Utkarsh Saxena et.al. | 2510.05373 | null |
| 2025-10-06 | A novel hallucination classification framework | Maksym Zavhorodnii et.al. | 2510.05189 | null |
| 2025-10-06 | RevMine: An LLM-Assisted Tool for Code Review Mining and Analysis Across Git Platforms | Samah Kansab et.al. | 2510.04796 | null |
| 2025-10-06 | SpikingMamba: Towards Energy-Efficient Large Language Models via Knowledge Distillation from Mamba | Yulong Huang et.al. | 2510.04595 | null |
| 2025-10-05 | Speculative Actions: A Lossless Framework for Faster Agentic Systems | Naimeng Ye et.al. | 2510.04371 | null |
| 2025-10-05 | Toward a unified framework for data-efficient evaluation of large language models | Lele Liao et.al. | 2510.04051 | null |
| 2025-10-02 | KVComm: Enabling Efficient LLM Communication through Selective KV Sharing | Xiangyu Shi et.al. | 2510.03346 | null |
| 2025-10-03 | Best-of-Majority: Minimax-Optimal Strategy for Pass@ $k$ Inference Scaling | Qiwei Di et.al. | 2510.03199 | null |
| 2025-10-03 | Dissecting Transformers: A CLEAR Perspective towards Green AI | Hemang Jain et.al. | 2510.02810 | null |
| 2025-10-03 | TokenFlow: Responsive LLM Text Streaming Serving under Request Burst via Preemptive Scheduling | Junyi Chen et.al. | 2510.02758 | null |
| 2025-10-03 | HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference | Shubham Negi et.al. | 2510.02675 | null |
| 2025-10-02 | Litespark Technical Report: High-Throughput, Energy-Efficient LLM Training Framework | Nii Osae Osae Dade et.al. | 2510.02483 | null |
| 2025-10-01 | PolyLink: A Blockchain Based Decentralized Edge AI Platform for LLM Inference | Hongbo Liu et.al. | 2510.02395 | null |
| 2025-10-03 | Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey | Qiyuan Liu et.al. | 2510.01925 | null |
| 2025-10-02 | SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning | Shicheng Liu et.al. | 2510.01832 | null |
| 2025-10-01 | HiSpec: Hierarchical Speculative Decoding for LLMs | Avinash Kumar et.al. | 2510.01336 | null |
| 2025-10-01 | Generalized Parallel Scaling with Interdependent Generations | Harry Dong et.al. | 2510.01143 | null |
| 2025-10-01 | Prompt Curriculum Learning for Efficient LLM Post-Training | Zhaolin Gao et.al. | 2510.01135 | null |
| 2025-10-01 | Family Matters: Language Transfer and Merging for Adapting Small LLMs to Faroese | Jenny Kunz et.al. | 2510.00810 | null |
| 2025-10-01 | Expected Attention: KV Cache Compression by Estimating Attention from Future Queries Distribution | Alessio Devoto et.al. | 2510.00636 | null |
| 2025-10-01 | Spectral Scaling Laws in Language Models: How Effectively Do Feed-Forward Networks Use Their Latent Space? | Nandan Kumar Jha et.al. | 2510.00537 | null |
| 2025-10-01 | Learning a Zeroth-Order Optimizer for Fine-Tuning LLMs | Kairun Zhang et.al. | 2510.00419 | null |
| 2025-10-02 | Large Language Models Inference Engines based on Spiking Neural Networks | Adarsha Balaji et.al. | 2510.00133 | null |
| 2025-10-01 | AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size | Guanxi Lu et.al. | 2509.26432 | null |
| 2025-09-30 | Toward an Unbiased Collective Memory for Efficient LLM-Based Agentic 6G Cross-Domain Management | Hatim Chergui et.al. | 2509.26200 | null |
| 2025-09-30 | Parallax: Efficient LLM Inference Service over Decentralized Environment | Chris Tong et.al. | 2509.26182 | null |
| 2025-09-30 | Accelerating LLM Inference with Precomputed Query Storage | Jay H. Park et.al. | 2509.25919 | null |
| 2025-09-30 | SAIL: SRAM-Accelerated LLM Inference System with Lookup-Table-based GEMV | Jingyao Zhang et.al. | 2509.25853 | null |
| 2025-09-29 | Scaling with Collapse: Efficient and Predictable Training of LLM Families | Shane Bergsma et.al. | 2509.25087 | null |
| 2025-09-29 | Intra-request branch orchestration for efficient LLM reasoning | Weifan Jiang et.al. | 2509.24957 | null |
| 2025-09-29 | SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching | Xinye Zhao et.al. | 2509.24832 | null |
| 2025-09-29 | SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving | Qihui Zhou et.al. | 2509.24626 | null |
| 2025-09-29 | Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding | Sungkyun Kim et.al. | 2509.24328 | null |
| 2025-07-22 | Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework | Hongyi Tang et.al. | 2507.16414 | null |
| 2025-07-21 | Efficient Routing of Inference Requests across LLM Instances in Cloud-Edge Computing | Shibo Yu et.al. | 2507.15553 | null |
| 2025-07-18 | Efficient LLM Inference: Bandwidth, Compute, Synchronization, and Capacity are all you need | Michael Davies et.al. | 2507.14397 | null |
| 2025-07-18 | Characterizing Communication Patterns in Distributed Large Language Model Inference | Lang Xu et.al. | 2507.14392 | null |
| 2025-07-18 | Can LLMs Infer Personality from Real World Conversations? | Jianfeng Zhu et.al. | 2507.14355 | null |
| 2025-07-14 | PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training | Pengfei Du et.al. | 2507.14202 | null |
| 2025-07-23 | Photonic Fabric Platform for AI Accelerators | Jing Ding et.al. | 2507.14000 | null |
| 2025-07-23 | DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training | Zhixin Wang et.al. | 2507.13833 | null |
| 2025-07-18 | Team of One: Cracking Complex Video QA with Model Synergy | Jun Xie et.al. | 2507.13820 | null |
| 2025-07-18 | LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues | Haoyang Li et.al. | 2507.13681 | null |
| 2025-07-17 | Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation | Genki Kusano et.al. | 2507.13525 | null |
| 2025-07-16 | Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage | Junqing Lin et.al. | 2507.12205 | null |
| 2025-07-15 | MIRAGE: KV Cache Optimization through Parameter Remapping for Multi-tenant LLM Serving | Ruihao Li et.al. | 2507.11507 | null |
| 2025-07-15 | Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations | Miray Özcan et.al. | 2507.11417 | null |
| 2025-07-15 | KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding | Luohe Shi et.al. | 2507.11273 | null |
| 2025-07-16 | GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning | Ziru Liu et.al. | 2507.10628 | null |
| 2025-07-14 | Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving | Wonung Kim et.al. | 2507.10178 | null |
| 2025-07-14 | Past-Future Scheduler for LLM Serving under SLA Guarantees | Ruihao Gong et.al. | 2507.10150 | null |
| 2025-07-14 | ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism | Zedong Liu et.al. | 2507.10069 | null |
| 2025-07-14 | Green-LLM: Optimal Workload Allocation for Environmentally-Aware Distributed Inference | Jiaming Cheng et.al. | 2507.09942 | null |
| 2025-07-13 | Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset | Lily Hong Zhang et.al. | 2507.09650 | null |
| 2025-07-12 | SLIM: A Heterogeneous Accelerator for Edge Inference of Sparse Large Language Model via Adaptive Thresholding | Weihong Xu et.al. | 2507.09201 | null |
| 2025-07-11 | On Evaluating Performance of LLM Inference Serving Systems | Amey Agrawal et.al. | 2507.09019 | null |
| 2025-07-11 | Hybrid Systolic Array Accelerator with Optimized Dataflow for Edge Large Language Model Inference | Chun-Ting Chen et.al. | 2507.09010 | null |
| 2025-07-11 | Orchestration for Domain-specific Edge-Cloud Language Models | Prasoon Patidar et.al. | 2507.09003 | null |
| 2025-07-11 | InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching | Yilun Wang et.al. | 2507.08523 | null |
| 2025-07-11 | Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training | Aleksei Ilin et.al. | 2507.08284 | null |
| 2025-07-10 | Reasoning and Behavioral Equilibria in LLM-Nash Games: From Mindsets to Actions | Quanyan Zhu et.al. | 2507.08208 | null |
| 2025-07-10 | Krul: Efficient State Restoration for Multi-turn Conversations with Dynamic Cross-layer KV Sharing | Junyi Wen et.al. | 2507.08045 | null |
| 2025-07-11 | Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models | Varin Sikka et.al. | 2507.07505 | null |
| 2025-07-16 | Proactive Intra-GPU Disaggregation of Prefill and Decode in LLM Serving | Xiaoxiang Shi et.al. | 2507.06608 | null |
| 2025-07-11 | QUEST: Query Optimization in Unstructured Document Analysis | Zhaoze Sun et.al. | 2507.06515 | null |
| 2025-07-08 | Voltage Regulation in Distribution Systems with Data Center Loads | Yize Chen et.al. | 2507.06416 | null |
| 2025-07-08 | Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models | L’ea Dubois et.al. | 2507.05822 | null |
| 2025-07-07 | Cascade: Token-Sharded Private LLM Inference | Rahul Thomas et.al. | 2507.05228 | null |
| 2025-07-07 | MoLink: Distributed and Efficient Serving Framework for Large Models | Lewei Jin et.al. | 2507.05043 | null |
| 2025-07-16 | Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? | Yun Qu et.al. | 2507.04632 | null |
| 2025-07-09 | Tail-aware Adversarial Attacks: A Distributional Approach to Efficient LLM Jailbreaking | Tim Beyer et.al. | 2507.04446 | null |
| 2025-07-23 | Fairness Evaluation of Large Language Models in Academic Library Reference Services | Haining Wang et.al. | 2507.04224 | null |
| 2025-07-05 | Enhancing Adaptive Behavioral Interventions with LLM Inference from Participant-Described States | Karine Karine et.al. | 2507.03871 | null |
| 2025-07-05 | OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference | Seungjun Shin et.al. | 2507.03865 | null |
| 2025-07-08 | MemOS: A Memory OS for AI System | Zhiyu Li et.al. | 2507.03724 | null |
| 2025-07-04 | Hummingbird: A Smaller and Faster Large Language Model Accelerator on Embedded FPGA | Jindong Li et.al. | 2507.03308 | null |
| 2025-07-03 | HGCA: Hybrid GPU-CPU Attention for Long Context LLM Inference | Weishu Deng et.al. | 2507.03153 | null |
| 2025-06-20 | Large Language Model-Driven Surrogate-Assisted Evolutionary Algorithm for Expensive Optimization | Lindong Xie et.al. | 2507.02892 | null |
| 2025-07-03 | On the Convergence of Large Language Model Optimizer for Black-Box Network Management | Hoon Lee et.al. | 2507.02689 | null |
| 2025-07-03 | Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference Infrastructure | Rui Xie et.al. | 2507.02654 | null |
| 2025-07-14 | FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference | Xing Liu et.al. | 2507.02620 | null |
| 2025-07-02 | Dissecting the Impact of Mobile DVFS Governors on LLM Inference Performance and Energy Efficiency | Zongpu Zhang et.al. | 2507.02135 | null |
| 2025-07-02 | AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training | Zhenyu Han et.al. | 2507.01663 | null |
| 2025-07-02 | Evaluating the Effectiveness of Direct Preference Optimization for Personalizing German Automatic Text Simplifications for Persons with Intellectual Disabilities | Yingqiang Gao et.al. | 2507.01479 | null |
| 2025-07-02 | LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation | Tianyu Liu et.al. | 2507.01449 | null |
| 2025-07-02 | EdgeLoRA: An Efficient Multi-Tenant LLM Serving System on Edge Devices | Zheyu Shen et.al. | 2507.01438 | null |
| 2025-07-08 | SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech | Zhuangfei Cheng et.al. | 2507.01348 | null |
| 2025-07-02 | La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation | Kai Liu et.al. | 2507.01299 | null |
| 2025-07-01 | PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning | Xingke Yang et.al. | 2507.01216 | null |
| 2025-06-28 | A Data Science Approach to Calcutta High Court Judgments: An Efficient LLM and RAG-powered Framework for Summarization and Similar Cases Retrieval | Puspendu Banerjee et.al. | 2507.01058 | null |
| 2025-07-01 | VEDA: Efficient LLM Generation Through Voting-based KV Cache Eviction and Dataflow-flexible Accelerator | Zhican Wang et.al. | 2507.00797 | null |
| 2025-07-01 | Cognitive Load-Aware Inference: A Neuro-Symbolic Framework for Optimizing the Token Economy of Large Language Models | Yilun Zhang et.al. | 2507.00653 | null |
| 2025-07-01 | LLM-Mesh: Enabling Elastic Sharing for Serverless LLM Inference | Chuhao Xu et.al. | 2507.00507 | null |
| 2025-07-01 | Serving LLMs in HPC Clusters: A Comparative Study of Qualcomm Cloud AI 100 Ultra and High-Performance GPUs | Mohammad Firas Sada et.al. | 2507.00418 | null |
| 2025-06-30 | Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission | Faranaksadat Solat et.al. | 2507.00082 | null |
| 2025-06-30 | Scaling Human Judgment in Community Notes with LLMs | Haiwen Li et.al. | 2506.24118 | null |
| 2025-06-30 | A Survey of LLM-based Automated Program Repair: Taxonomies, Design Paradigms, and Applications | Boyang Yang et.al. | 2506.23749 | null |
| 2025-06-28 | Spectra 1.1: Scaling Laws and Efficient Inference for Ternary Language Models | Tejas Vaidhya et.al. | 2506.23025 | null |
| 2025-06-28 | Smaller = Weaker? Benchmarking Robustness of Quantized LLMs in Code Generation | Sen Fang et.al. | 2506.22776 | null |
| 2025-07-01 | Not All Water Consumption Is Equal: A Water Stress Weighted Metric for Sustainable Computing | Yanran Wu et.al. | 2506.22773 | null |
| 2025-06-27 | QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization | Danush Khanna et.al. | 2506.22396 | null |
| 2025-06-27 | Towards Operational Data Analytics Chatbots – Virtual Knowledge Graph is All You Need | Junaid Ahmed Khan et.al. | 2506.22267 | null |
| 2025-06-27 | SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference | Yongchao He et.al. | 2506.22033 | null |
| 2025-06-27 | A Survey of LLM Inference Systems | James Pan et.al. | 2506.21901 | null |
| 2025-06-26 | Efficient and Reuseable Cloud Configuration Search Using Discovery Spaces | Michael Johnston et.al. | 2506.21467 | null |
| 2025-06-26 | BLOCKS: Blockchain-supported Cross-Silo Knowledge Sharing for Efficient LLM Services | Zhaojiacheng Zhou et.al. | 2506.21033 | null |
| 2025-06-17 | Utility-Driven Speculative Decoding for Mixture-of-Experts | Anish Saxena et.al. | 2506.20675 | null |
| 2025-06-25 | DipSVD: Dual-importance Protected SVD for Efficient LLM Compression | Xuan Ding et.al. | 2506.20353 | null |
| 2025-07-02 | Breaking the Boundaries of Long-Context LLM Inference: Adaptive KV Management on a Single Commodity GPU | He Sun et.al. | 2506.20187 | null |
| 2025-06-24 | MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection | Zhengxiang Huang et.al. | 2506.19884 | null |
| 2025-06-24 | Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models | Jungwoo Park et.al. | 2506.19697 | null |
| 2025-06-25 | Adaptive Request Scheduling for CodeLLM Serving with SLA Guarantees | Shi Chang et.al. | 2506.19677 | null |
| 2025-06-23 | Black-Box Test Code Fault Localization Driven by Large Language Models and Execution Estimation | Ahmadreza Saboor Yaraghi et.al. | 2506.19045 | null |
| 2025-06-23 | WiLLM: An Open Wireless LLM Communication System | Boyi Liu et.al. | 2506.19030 | null |
| 2025-06-23 | LLMs on a Budget? Say HOLA | Zohaib Hasan Siddiqui et.al. | 2506.18952 | null |
| 2025-06-23 | CommVQ: Commutative Vector Quantization for KV Cache Compression | Junyan Li et.al. | 2506.18879 | null |
| 2025-06-26 | PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries | Steven Kolawole et.al. | 2506.18728 | null |
| 2025-06-22 | Mechanistic Interpretability in the Presence of Architectural Obfuscation | Marcos Florencio et.al. | 2506.18053 | null |
| 2025-06-22 | LLMs for Customized Marketing Content Generation and Evaluation at Scale | Haoran Liu et.al. | 2506.17863 | null |
| 2025-07-18 | LLM-driven Medical Report Generation via Communication-efficient Heterogeneous Federated Learning | Haoxuan Che et.al. | 2506.17562 | null |
| 2025-06-08 | Training-free LLM Verification via Recycling Few-shot Examples | Dongseok Lee et.al. | 2506.17251 | null |
| 2025-06-20 | Towards AI Search Paradigm | Yuchen Li et.al. | 2506.17188 | null |
| 2025-06-23 | From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents | Mohammad Amaan Sayeed et.al. | 2506.15911 | null |
| 2025-05-30 | Learn from the Past: Fast Sparse Indexing for Large Language Model Decoding | Feiyu Yao et.al. | 2506.15704 | null |
| 2025-06-18 | eLLM: Elastic Memory Management Framework for Efficient LLM Serving | Jiale Xu et.al. | 2506.15155 | null |
| 2025-06-17 | CrEst: Credibility Estimation for Contexts in LLMs via Weak Supervision | Dyah Adila et.al. | 2506.14912 | null |
| 2025-06-17 | Cost-Efficient Serving of LLM Agents via Test-Time Plan Caching | Qizheng Zhang et.al. | 2506.14852 | null |
| 2025-06-05 | MobiEdit: Resource-efficient Knowledge Editing for Personalized On-device LLMs | Zhenyan Lu et.al. | 2506.13772 | null |
| 2025-06-17 | Prefix-Tuning+: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention | Haonan Wang et.al. | 2506.13674 | null |
| 2025-06-16 | Vector Ontologies as an LLM world view extraction method | Kaspar Rothenfusser et.al. | 2506.13252 | link |
| 2025-06-16 | Empirical Evaluation of Large Language Models in Automated Program Repair | Jiajun Sun et.al. | 2506.13186 | null |
| 2025-06-19 | Serving Large Language Models on Huawei CloudMatrix384 | Pengfei Zuo et.al. | 2506.12708 | null |
| 2025-06-13 | Semantic Scheduling for LLM Inference | Wenyue Hua et.al. | 2506.12204 | link |
| 2025-05-21 | FlexQuant: A Flexible and Efficient Dynamic Precision Switching Framework for LLM Quantization | Fangxin Liu et.al. | 2506.12024 | null |
| 2025-06-13 | Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache | Xiaoran Liu et.al. | 2506.11886 | null |
| 2025-06-13 | GraphRAG-Causal: A novel graph-augmented framework for causal reasoning and annotation in news | Abdul Haque et.al. | 2506.11600 | null |
| 2025-06-13 | Collaborative LLM Inference via Planning for Efficient Reasoning | Byeongchan Lee et.al. | 2506.11578 | null |
| 2025-06-13 | Efficient Long-Context LLM Inference via KV Cache Clustering | Jie Hu et.al. | 2506.11418 | null |
| 2025-06-12 | From Replication to Redesign: Exploring Pairwise Comparisons for LLM-Based Peer Review | Yaohui Zhang et.al. | 2506.11343 | null |
| 2025-06-12 | SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding | Ziyi Zhang et.al. | 2506.11309 | null |
| 2025-06-06 | DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration | Hanzhi Zhang et.al. | 2506.11104 | link |
| 2025-06-12 | Slimming Down LLMs Without Losing Their Minds | Qingda et.al. | 2506.10885 | null |
| 2025-06-12 | AdaptiveLLM: A Framework for Selecting Optimal Cost-Efficient LLM for Code-Generation Based on CoT Length | Junhang Cheng et.al. | 2506.10525 | link |
| 2025-06-12 | TD-Pipe: Temporally-Disaggregated Pipeline Parallelism Architecture for High-Throughput LLM Inference | Hongbin Zhang et.al. | 2506.10470 | null |
| 2025-06-11 | A First Look at Bugs in LLM Inference Engines | Mugeng Liu et.al. | 2506.09713 | link |
| 2025-06-12 | Understanding the Performance and Power of LLM Inferencing on Edge Accelerators | Mayank Arya et.al. | 2506.09554 | null |
| 2025-06-11 | Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning | Jiayi Yuan et.al. | 2506.09501 | null |
| 2025-06-10 | Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive- $k$ | Chihiro Taguchi et.al. | 2506.08479 | null |
| 2025-07-19 | Draft-based Approximate Inference for LLMs | Kevin Galim et.al. | 2506.08373 | link |
| 2025-06-09 | MiniCPM4: Ultra-Efficient LLMs on End Devices | MiniCPM Team et.al. | 2506.07900 | link |
| 2025-06-09 | How Benchmark Prediction from Fewer Data Misses the Mark | Guanhua Zhang et.al. | 2506.07673 | link |
| 2025-06-09 | TreeReview: A Dynamic Tree of Questions Framework for Deep and Efficient LLM-based Scientific Peer Review | Yuan Chang et.al. | 2506.07642 | null |
| 2025-06-09 | MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts | Wei Tao et.al. | 2506.07533 | null |
| 2025-06-07 | Containerized In-Storage Processing and Computing-Enabled SSD Disaggregation | Miryeong Kwon et.al. | 2506.06769 | null |
| 2025-06-06 | Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques | Adarsh Prasad Behera et.al. | 2506.06579 | null |
| 2025-06-06 | Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage | Ziqi Yuan et.al. | 2506.06472 | null |
| 2025-07-08 | On the Fundamental Impossibility of Hallucination Control in Large Language Models | Michał P. Karpowicz et.al. | 2506.06382 | null |
| 2025-05-21 | Reward Is Enough: LLMs Are In-Context Reinforcement Learners | Kefan Song et.al. | 2506.06303 | null |
| 2025-06-06 | AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search | Yu Li et.al. | 2506.06017 | null |
| 2025-06-06 | FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model | Md Jueal Mia et.al. | 2506.05640 | link |
| 2025-06-11 | Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models | Yanzhao Zhang et.al. | 2506.05176 | null |
| 2025-06-05 | Are LLMs Reliable Translators of Logical Reasoning Across Lexically Diversified Contexts? | Qingchuan Li et.al. | 2506.04575 | link |
| 2025-06-04 | Cascadia: A Cascade Serving System for Large Language Models | Youhe Jiang et.al. | 2506.04203 | null |
| 2025-06-04 | SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling | Anhao Zhao et.al. | 2506.04179 | null |
| 2025-06-04 | GORACS: Group-level Optimal Transport-guided Coreset Selection for LLM-based Recommender Systems | Tiehua Mei et.al. | 2506.04015 | null |
| 2025-06-04 | Pre $^3$ : Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation | Junyi Chen et.al. | 2506.03887 | null |
| 2025-06-04 | Client-Side Zero-Shot LLM Inference for Comprehensive In-Browser URL Analysis | Avihay Cohen et.al. | 2506.03656 | null |
| 2025-06-04 | POSS: Position Specialist Generates Better Draft for Speculative Decoding | Langlin Huang et.al. | 2506.03566 | link |
| 2025-07-10 | Parallel CPU-GPU Execution for LLM Inference on Constrained GPUs | Jiakun Fan et.al. | 2506.03296 | null |
| 2025-06-03 | QKV Projections Require a Fraction of Their Memory | Malik Khalaf et.al. | 2506.02939 | null |
| 2025-06-03 | Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs | Shangmin Guo et.al. | 2506.02918 | null |
| 2025-06-14 | TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression | Zhong-Zhi Li et.al. | 2506.02678 | link |
| 2025-07-23 | KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider | Jiahao Wang et.al. | 2506.02634 | link |
| 2025-06-03 | HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference | Ping Gong et.al. | 2506.02572 | link |
| 2025-06-03 | Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective | Shenghua He et.al. | 2506.02553 | null |
| 2025-05-29 | NestedFP: High-Performance, Memory-Efficient Dual-Precision Floating Point Support for LLMs | Haeun Lee et.al. | 2506.02024 | null |
| 2025-05-24 | Efficient and Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing | Zhaoyuan Su et.al. | 2506.02006 | null |
| 2025-05-16 | Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism | Yuhao Shen et.al. | 2506.01979 | null |
| 2025-06-02 | Memory Access Characterization of Large Language Models in CPU Environment and its Potential Impacts | Spencer Banasik et.al. | 2506.01827 | null |
| 2025-05-13 | AI Accelerators for Large Language Model Inference: Architecture Analysis and Scaling Strategies | Amit Sharma et.al. | 2506.00008 | null |
| 2025-05-30 | AFLoRA: Adaptive Federated Fine-Tuning of Large Language Models with Resource-Aware Low-Rank Adaption | Yajie Zhou et.al. | 2505.24773 | null |
| 2025-05-30 | SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training | Yehonathan Refael et.al. | 2505.24749 | null |
| 2025-05-30 | Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching | Juan Wisznia et.al. | 2505.24643 | null |
| 2025-05-30 | LLM Inference Enhanced by External Knowledge: A Survey | Yu-Hsuan Lin et.al. | 2505.24377 | link |
| 2025-05-30 | SkyLB: A Locality-Aware Cross-Region Load Balancer for LLM Inference | Tian Xia et.al. | 2505.24095 | null |
| 2025-05-29 | Large Language Model Meets Constraint Propagation | Alexandre Bonlarron et.al. | 2505.24012 | null |
| 2025-05-29 | EmbAdvisor: Adaptive Cache Management for Sustainable LLM Serving | Yuyang Tian et.al. | 2505.23970 | null |
| 2025-05-29 | Sustainable Carbon-Aware and Water-Efficient LLM Scheduling in Geo-Distributed Cloud Datacenters | Hayden Moore et.al. | 2505.23554 | null |
| 2025-06-10 | Ghidorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism | Jinhui Wei et.al. | 2505.23219 | null |
| 2025-05-29 | SCORPIO: Serving the Right Requests at the Right Time for Heterogeneous SLOs in LLM Inference | Yinghao Tang et.al. | 2505.23022 | null |
| 2025-05-28 | Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference | Donghyeon Joo et.al. | 2505.22913 | link |
| 2025-05-28 | AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models | Feng Luo et.al. | 2505.22662 | null |
| 2025-05-28 | Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR | Mingchen Shao et.al. | 2505.22063 | null |
| 2025-05-28 | ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning | Zhendong Mi et.al. | 2505.21987 | null |
| 2025-05-28 | Towards Efficient Key-Value Cache Management for Prefix Prefilling in LLM Inference | Yue Zhu et.al. | 2505.21919 | null |
| 2025-05-29 | EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse | Tianyu Guo et.al. | 2505.21889 | link |
| 2025-05-28 | HoliTom: Holistic Token Merging for Fast Video Large Language Models | Kele Shao et.al. | 2505.21334 | link |
| 2025-06-04 | LLMs Think, But Not In Your Flow: Reasoning-Level Personalization for Black-Box Large Language Models | Jieyong Kim et.al. | 2505.21082 | null |
| 2025-05-27 | Efficient Large Language Model Inference with Neural Block Linearization | Mete Erdogan et.al. | 2505.21077 | null |
| 2025-07-18 | FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration | Daehyeon Baek et.al. | 2505.20839 | null |
| 2025-05-26 | HAMburger: Accelerating LLM Inference via Token Smashing | Jingyu Liu et.al. | 2505.20438 | null |
| 2025-05-23 | Less Context, Same Performance: A RAG Framework for Resource-Efficient LLM-Based Clinical NLP | Satya Narayana Cheetirala et.al. | 2505.20320 | null |
| 2025-05-26 | APE: A Data-Centric Benchmark for Efficient LLM Adaptation in Text Summarization | Javier Marín et.al. | 2505.19912 | link |
| 2025-06-13 | MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE | Zongle Huang et.al. | 2505.19645 | null |
| 2025-05-26 | VLMLight: Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning | Maonan Wang et.al. | 2505.19486 | null |
| 2025-05-26 | BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs | Guilong Lu et.al. | 2505.19457 | link |
| 2025-05-26 | WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference | Sihan Chen et.al. | 2505.19427 | link |
| 2025-05-25 | DECA: A Near-Core LLM Decompression Accelerator Supporting Out-of-Order Invocation | Gerasimos Gerogiannis et.al. | 2505.19349 | null |
| 2025-05-25 | Can Large Language Models Infer Causal Relationships from Real-World Text? | Ryan Saklad et.al. | 2505.18931 | null |
| 2025-06-18 | ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models | Hao Chen et.al. | 2505.18799 | null |
| 2025-06-03 | A Survey of LLM $\times$ DATA | Xuanhe Zhou et.al. | 2505.18458 | null |
| 2025-05-23 | LatentLLM: Attention-Aware Joint Tensor Compression | Toshiaki Koike-Akino et.al. | 2505.18413 | null |
| 2025-05-23 | An Attack to Break Permutation-Based Private Third-Party Inference Schemes for LLMs | Rahul Thomas et.al. | 2505.18332 | null |
| 2025-07-01 | Two-Stage Regularization-Based Structured Pruning for LLMs | Mingkuan Feng et.al. | 2505.18232 | null |
| 2025-05-23 | NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache | Donghyun Son et.al. | 2505.18231 | null |
| 2025-05-23 | Navigating Pitfalls: Evaluating LLMs in Machine Learning Programming Education | Smitha Kumar et.al. | 2505.18220 | null |
| 2025-05-23 | Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning | Michael Hassid et.al. | 2505.17813 | null |
| 2025-05-23 | DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies | Ning Yang et.al. | 2505.17420 | null |
| 2025-05-26 | RAP: Runtime-Adaptive Pruning for LLM Inference | Huanrong Liu et.al. | 2505.17138 | null |
| 2025-05-20 | Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency | Ruixiao Li et.al. | 2505.17074 | null |
| 2025-05-16 | SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs | Jinwoo Park et.al. | 2505.17052 | null |
| 2025-05-22 | CASTILLO: Characterizing Response Length Distributions of Large Language Models | Daniel F. Perez-Ramirez et.al. | 2505.16881 | link |
| 2025-05-24 | Recursive Offloading for LLM Serving in Multi-tier Networks | Zhiyuan Wu et.al. | 2505.16502 | link |
| 2025-05-22 | Reading Between the Prompts: How Stereotypes Shape LLM’s Implicit Personalization | Vera Neplenbroek et.al. | 2505.16467 | link |
| 2025-05-22 | LightRouter: Towards Efficient LLM Collaboration with Minimal Overhead | Yifan Zhang et.al. | 2505.16221 | null |
| 2025-05-31 | QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design | Benjamin Schneider et.al. | 2505.16175 | link |
| 2025-05-22 | KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization | Mingbo Song et.al. | 2505.16162 | null |
| 2025-05-21 | Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning | Jinghui Lu et.al. | 2505.15154 | null |
| 2025-05-21 | BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms | Yunlong Hou et.al. | 2505.15141 | null |
| 2025-06-04 | Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity | Susav Shrestha et.al. | 2505.14884 | link |
| 2025-05-20 | ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions | Bufang Yang et.al. | 2505.14668 | null |
| 2025-05-20 | ServerlessLoRA: Minimizing Latency and Cost in Serverless Inference for LoRA-Based LLMs | Yifan Sui et.al. | 2505.14468 | null |
| 2025-05-20 | Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning | Jiwon Song et.al. | 2505.13866 | link |
| 2025-05-19 | Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training | Shane Bergsma et.al. | 2505.13738 | null |
| 2025-05-16 | An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents | Ayesha Amjad et.al. | 2505.13504 | null |
| 2025-04-02 | Large Language Model powered Symbolic Execution | Yihe Li et.al. | 2505.13452 | null |
| 2025-05-19 | Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately | Yuhang Wang et.al. | 2505.13326 | null |
| 2025-05-19 | HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding | Siran Liu et.al. | 2505.13254 | null |
| 2025-05-19 | FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference | Guangda Liu et.al. | 2505.13109 | null |
| 2025-05-19 | EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code | Yuhao Qing et.al. | 2505.13004 | link |
| 2025-05-25 | FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks | Zihua Wang et.al. | 2505.12728 | link |
| 2025-05-19 | HydraInfer: Hybrid Disaggregated Scheduling for Multimodal Large Language Model Serving | Xianzhe Dong et.al. | 2505.12658 | null |
| 2025-05-17 | Enhancing Complex Instruction Following for Large Language Models with Mixture-of-Contexts Fine-tuning | Yuheng Lu et.al. | 2505.11922 | null |
| 2025-05-17 | Arrow: Adaptive Scheduling Mechanisms for Disaggregated LLM Inference Architecture | Yu Wu et.al. | 2505.11916 | null |
| 2025-05-25 | Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning | Yansong Ning et.al. | 2505.11827 | null |
| 2025-07-10 | TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference | Raja Gond et.al. | 2505.11329 | link |
| 2025-05-23 | SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning | Zheng Li et.al. | 2505.11274 | null |
| 2025-05-16 | Vaiage: A Multi-Agent Solution to Personalized Travel Planning | Binwen Liu et.al. | 2505.10922 | null |
| 2025-05-21 | SpecOffload: Unlocking Latent GPU Capacity for LLM Inference on Resource-Constrained Devices | Xiangwen Zhuge et.al. | 2505.10259 | link |
| 2025-06-05 | ServeGen: Workload Characterization and Generation of Large Language Model Serving in Production | Yuxing Xiang et.al. | 2505.09999 | link |
| 2025-05-15 | How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference | Nidhal Jegham et.al. | 2505.09598 | null |
| 2025-05-14 | Statistical Modeling and Uncertainty Estimation of LLM Inference Systems | Kaustabha Ray et.al. | 2505.09319 | null |
| 2025-05-15 | ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor | Seungbeom Choi et.al. | 2505.09142 | link |
| 2025-05-13 | ITERA-LLM: Boosting Sub-8-Bit Large Language Model Inference via Iterative Tensor Decomposition | Keran Zheng et.al. | 2505.08981 | null |
| 2025-06-30 | LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries | Zekun Wu et.al. | 2505.08842 | null |
| 2025-05-13 | Automatic Task Detection and Heterogeneous LLM Speculative Decoding | Danying Ge et.al. | 2505.08600 | null |
| 2025-05-08 | Scaling Laws for Speculative Decoding | Siyuan Yan et.al. | 2505.07858 | null |
| 2025-05-12 | SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models | Hang Wu et.al. | 2505.07680 | null |
| 2025-05-12 | LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning | Xiaotian Lin et.al. | 2505.07437 | link |
| 2025-05-12 | Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity | Guang Yan et.al. | 2505.07239 | null |
| 2025-05-12 | PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications | Kuntai Du et.al. | 2505.07203 | null |
| 2025-06-15 | I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference | Zibo Gao et.al. | 2505.06738 | null |
| 2025-05-09 | Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference | Haolin Zhang et.al. | 2505.06461 | null |
| 2025-04-30 | Towards Efficient LLM Storage Reduction via Tensor Deduplication and Delta Compression | Zirui Wang et.al. | 2505.06252 | null |
| 2025-05-09 | Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM | Zehao Fan et.al. | 2505.05772 | null |
| 2025-05-08 | PRIMG : Efficient LLM-driven Test Generation Using Mutant Prioritization | Mohamed Salah Bouafif et.al. | 2505.05584 | link |
| 2025-05-08 | HEXGEN-TEXT2SQL: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL Workflow | You Peng et.al. | 2505.05286 | link |
| 2025-05-12 | Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving | Shan Yu et.al. | 2505.04021 | null |
| 2025-05-31 | LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection | Xinyue Zeng et.al. | 2505.03793 | link |
| 2025-05-15 | GPU Performance Portability needs Autotuning | Burkhard Ringlein et.al. | 2505.03780 | link |
| 2025-04-21 | Splitwiser: Efficient LM inference with constrained resources | Asad Aali et.al. | 2505.03763 | link |
| 2025-04-07 | AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design | Yanbiao Liang et.al. | 2505.03745 | null |
| 2025-05-06 | Faster MoE LLM Inference for Extremely Large Models | Haoqi Yang et.al. | 2505.03531 | null |
| 2025-05-16 | 34 Examples of LLM Applications in Materials Science and Chemistry: Towards Automation, Assistants, Agents, and Accelerated Scientific Discovery | Yoel Zimmermann et.al. | 2505.03049 | null |
| 2025-06-30 | RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference | Yaoqi Chen et.al. | 2505.02922 | null |
| 2025-05-06 | EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices | Arnab Sanyal et.al. | 2505.02380 | null |
| 2025-05-03 | Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients | Yezhen Wang et.al. | 2505.01744 | null |
| 2025-05-03 | High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers | Brian Wong et.al. | 2505.01693 | null |
| 2025-05-08 | A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency | Sihyeong Park et.al. | 2505.01658 | link |
| 2025-05-02 | PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding | Bradley McDanel et.al. | 2505.01572 | null |
| 2025-05-01 | Spill The Beans: Exploiting CPU Cache Side-Channels to Leak Tokens from Large Language Models | Andrew Adiletta et.al. | 2505.00817 | null |
| 2025-04-29 | Efficient LLMs with AMP: Attention Heads and MLP Pruning | Leandro Giusti Mugnaini et.al. | 2504.21174 | null |
| 2025-04-29 | Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts | Hanhua Hong et.al. | 2504.21117 | null |
| 2025-04-30 | Ascendra: Dynamic Request Prioritization for Efficient LLM Serving | Azam Ikram et.al. | 2504.20828 | null |
| 2025-04-30 | GenTorrent: Scaling Large Language Model Serving with An Overley Network | Fei Fang et.al. | 2504.20101 | null |
| 2025-04-24 | Tempo: Application-aware LLM Serving with Mixed SLO Requirements | Wei Zhang et.al. | 2504.20068 | null |
| 2025-04-28 | AutoJudge: Judge Decoding Without Manual Annotation | Roman Garipov et.al. | 2504.20039 | null |
| 2025-04-28 | semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage | Ke Hong et.al. | 2504.19867 | null |
| 2025-04-28 | Taming the Titans: A Survey of Efficient LLM Inference Serving | Ranran Zhen et.al. | 2504.19720 | link |
| 2025-04-28 | Bullet: Boosting GPU Utilization for LLM Serving via Dynamic Spatial-Temporal Orchestration | Zejia Lin et.al. | 2504.19516 | null |
| 2025-04-28 | R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference | Zhenyu Zhang et.al. | 2504.19449 | null |
| 2025-04-28 | Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory | Prateek Chhikara et.al. | 2504.19413 | null |
| 2025-05-07 | A Simple Ensemble Strategy for LLM Inference: Towards More Stable Text Classification | Junichiro Niimi et.al. | 2504.18884 | link |
| 2025-06-15 | PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation | Zihao An et.al. | 2504.18583 | null |
| 2025-04-25 | EcoServe: Enabling Cost-effective LLM Serving with Proactive Intra- and Inter-Instance Orchestration | Jiangsu Du et.al. | 2504.18154 | null |
| 2025-04-25 | PropRAG: Guiding Retrieval with Beam Search over Proposition Paths | Jingjin Wang et.al. | 2504.18070 | null |
| 2025-04-25 | Streaming, Fast and Slow: Cognitive Load-Aware Streaming for Efficient LLM Serving | Chang Xiao et.al. | 2504.17999 | null |
| 2025-04-24 | Energy Considerations of Large Language Model Inference and Efficiency Optimizations | Jared Fernandez et.al. | 2504.17674 | null |
| 2025-04-24 | L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference | Qingyuan Liu et.al. | 2504.17584 | null |
| 2025-04-24 | A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task | Jiaqi Deng et.al. | 2504.17547 | null |
| 2025-04-24 | On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration | Maoyang Xiang et.al. | 2504.17376 | null |
| 2025-04-26 | QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining | Fengze Liu et.al. | 2504.16511 | null |
| 2025-04-18 | HPU: High-Bandwidth Processing Unit for Scalable, Cost-effective LLM Inference via GPU Co-processing | Myunghyun Rhee et.al. | 2504.16112 | null |
| 2025-05-29 | Optimizing Token Consumption in LLMs: A Nano Surge Approach for Code Reasoning Efficiency | Junwei Hu et.al. | 2504.15989 | null |
| 2025-04-22 | SeaLLM: Service-Aware and Latency-Optimized Resource Sharing for Large Language Model Inference | Yihao Zhao et.al. | 2504.15720 | null |
| 2025-04-23 | A LoRA-Based Approach to Fine-Tuning LLMs for Educational Guidance in Resource-Constrained Settings | Md Millat Hosen et.al. | 2504.15610 | link |
| 2025-04-21 | Speculative Sampling via Exponential Races | Szymon Kobus et.al. | 2504.15475 | null |
| 2025-05-20 | KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments | Junyoung Park et.al. | 2504.15364 | null |
| 2025-04-18 | High-Throughput LLM inference on Heterogeneous Clusters | Yi Xiong et.al. | 2504.15303 | null |
| 2025-04-17 | D $^{2}$ MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving | Haodong Wang et.al. | 2504.15299 | null |
| 2025-06-12 | SLO-Aware Scheduling for Large Language Model Inferences | Jinqi Huang et.al. | 2504.14966 | null |
| 2025-04-21 | Hardware-based Heterogeneous Memory Management for Large Language Model Inference | Soojin Hwang et.al. | 2504.14893 | null |
| 2025-05-28 | gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling | Tianyu Guo et.al. | 2504.14775 | link |
| 2025-04-20 | Knowledge Distillation and Dataset Distillation of Large Language Models: Emerging Trends, Challenges, and Future Directions | Luyang Fang et.al. | 2504.14772 | null |
| 2025-04-22 | Optimizing SLO-oriented LLM Serving with PD-Multiplexing | Weihao Cui et.al. | 2504.14489 | null |
| 2025-04-19 | Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator | Akshat Ramachandran et.al. | 2504.14365 | null |
| 2025-04-19 | FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference | Coleman Hooper et.al. | 2504.14152 | null |
| 2025-05-12 | From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs | Jiliang Ni et.al. | 2504.13471 | null |
| 2025-05-23 | The Quantum LLM: Modeling Semantic Spaces with Quantum Principles | Timo Aukusti Laine et.al. | 2504.13202 | null |
| 2025-04-25 | Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving | Yaoyao Ding et.al. | 2504.12984 | null |
| 2025-04-17 | Data-efficient LLM Fine-tuning for Code Generation | Weijie Lv et.al. | 2504.12687 | link |
| 2025-04-16 | Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading | Kihyun Kim et.al. | 2504.11816 | link |
| 2025-04-16 | Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs | Hyungwoo Lee et.al. | 2504.11765 | null |
| 2025-04-16 | Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures | Prabhu Vellaisamy et.al. | 2504.11750 | null |
| 2025-04-16 | Progent: Programmable Privilege Control for LLM Agents | Tianneng Shi et.al. | 2504.11703 | link |
| 2025-04-15 | Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints | Ruicheng Ao et.al. | 2504.11320 | link |
| 2025-04-14 | HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving | Avinash Kumar et.al. | 2504.10724 | null |
| 2025-04-14 | Load Balancing with Network Latencies via Distributed Gradient Descent | Santiago R. Balseiro et.al. | 2504.10693 | null |
| 2025-04-15 | AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference | Yangshen Deng et.al. | 2504.10326 | null |
| 2025-04-14 | KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference | Yuxuan Tian et.al. | 2504.09936 | null |
| 2025-04-20 | Understanding and Optimizing Multi-Stage AI Inference Pipelines | Abhimanyu Rajeshkumar Bambhaniya et.al. | 2504.09775 | null |
| 2025-04-13 | Integrating Large Language Models for Automated Structural Analysis | Haoran Liang et.al. | 2504.09754 | null |
| 2025-04-13 | Efficient LLM Serving on Hybrid Real-time and Best-effort Requests | Wan Borui et.al. | 2504.09590 | null |
| 2025-04-13 | LoopLynx: A Scalable Dataflow Architecture for Efficient LLM Inference | Jianing Zheng et.al. | 2504.09561 | link |
| 2025-04-12 | MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints | Yichao Yuan et.al. | 2504.09345 | null |
| 2025-05-22 | DynaServe: Unified and Elastic Execution for Dynamic Disaggregated LLM Serving | Chaoyi Ruan et.al. | 2504.09285 | null |
| 2025-04-11 | An Adaptive Vector Index Partitioning Scheme for Low-Latency RAG Pipeline | Junkyum Kim et.al. | 2504.08930 | null |
| 2025-04-11 | SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting | Jiaming Xu et.al. | 2504.08850 | null |
| 2025-05-31 | SD $^2$ : Self-Distilled Sparse Drafters | Mike Lasby et.al. | 2504.08838 | null |
| 2025-04-07 | PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters | Zonghang Li et.al. | 2504.08791 | link |
| 2025-04-11 | Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash | Fucheng Jia et.al. | 2504.08378 | null |
| 2025-04-11 | Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices | Shengyuan Ye et.al. | 2504.08242 | null |
| 2025-04-10 | Token Level Routing Inference System for Edge Devices | Jianshu She et.al. | 2504.07878 | null |
| 2025-04-10 | A System for Comprehensive Assessment of RAG Frameworks | Mattia Rengo et.al. | 2504.07803 | link |
| 2025-04-11 | Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving | Shihong Gao et.al. | 2504.07494 | null |
| 2025-04-10 | UniCAIM: A Unified CAM/CIM Architecture with Static-Dynamic KV Cache Pruning for Efficient Long-Context LLM Inference | Weikai Xu et.al. | 2504.07479 | null |
| 2025-04-24 | Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents | Yueying Li et.al. | 2504.07347 | null |
| 2025-04-08 | S’MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning | Hanqing Zeng et.al. | 2504.06426 | null |
| 2025-04-08 | SPIRe: Boosting LLM Inference Throughput with Speculative Decoding | Sanjit Neelam et.al. | 2504.06419 | null |
| 2025-04-08 | Mosaic: Composite Projection Pruning for Resource-efficient LLMs | Bailey J. Eccles et.al. | 2504.06323 | null |
| 2025-04-08 | Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching | Yanhao Dong et.al. | 2504.06319 | null |
| 2025-05-23 | Hogwild! Inference: Parallel LLM Generation via Concurrent Attention | Gleb Rodionov et.al. | 2504.06261 | null |
| 2025-05-27 | User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems | Jianling Wang et.al. | 2504.05522 | null |
| 2025-04-07 | REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding | Sakib Reza et.al. | 2504.05491 | null |
| 2025-04-07 | Evaluating Knowledge Graph Based Retrieval Augmented Generation Methods under Knowledge Incompleteness | Dongzhuoran Zhou et.al. | 2504.05163 | null |
| 2025-05-20 | Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning | Sugyeong Eo et.al. | 2504.05047 | null |
| 2025-04-05 | PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models | Haofei Yin et.al. | 2504.04104 | null |
| 2025-04-03 | FlowKV: A Disaggregated Inference Framework with Low-Latency KV Cache Transfer and Load-Aware Scheduling | Weiqing Li et.al. | 2504.03775 | null |
| 2025-03-30 | VFlow: Discovering Optimal Agentic Workflows for Verilog Generation | Yangbo Wei et.al. | 2504.03723 | null |
| 2025-04-08 | MILLION: Mastering Long-Context LLM Inference Via Outlier-Immunized KV Product Quantization | Zongwu Wang et.al. | 2504.03661 | link |
| 2025-03-01 | Echo: Efficient Co-Scheduling of Hybrid Online-Offline Tasks for Large Language Model Serving | Zhibin Wang et.al. | 2504.03651 | null |
| 2025-02-22 | AIBrix: Towards Scalable, Cost-Effective Large Language Model Inference Infrastructure | The AIBrix Team et.al. | 2504.03648 | null |
| 2025-04-04 | Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency | Erik Johannes Husom et.al. | 2504.03360 | null |
| 2025-04-04 | Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation | Weitao Li et.al. | 2504.03165 | link |
| 2025-04-03 | Narrative Studio: Visual narrative exploration using LLMs and Monte Carlo Tree Search | Parsa Ghaffari et.al. | 2504.02426 | link |
| 2025-04-01 | SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching | Yuxuan Zhu et.al. | 2504.00970 | null |
| 2025-06-04 | Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding | Aayush Gautam et.al. | 2504.00030 | null |
| 2025-03-31 | TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers’ Guidance | Jingxian Xu et.al. | 2503.24198 | null |
| 2025-04-06 | ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance | Tong Xie et.al. | 2503.24053 | link |
| 2025-03-31 | Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving | Wei Gao et.al. | 2503.24000 | link |
| 2025-03-31 | Model Hemorrhage and the Robustness Limits of Large Language Models | Ziyang Ma et.al. | 2503.23924 | null |
| 2025-03-31 | MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration | Tatsuya Kubo et.al. | 2503.23817 | null |
| 2025-03-30 | Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference | Wei Tao et.al. | 2503.23294 | null |
| 2025-03-30 | PromptDistill: Query-based Selective Token Retention in Intermediate Layers for Efficient Large Language Model Inference | Weisheng Jin et.al. | 2503.23274 | link |
| 2025-03-28 | Niyama : Breaking the Silos of LLM Inference Serving | Kanishk Goel et.al. | 2503.22562 | null |
| 2025-03-26 | Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation | Yunkai Liang et.al. | 2503.20552 | link |
| 2025-03-25 | LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation | Han Chen et.al. | 2503.19950 | link |
| 2025-03-24 | LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment | Varsha Embar et.al. | 2503.19090 | null |
| 2025-03-23 | SplitFrozen: Split Learning with Device-side Model Frozen for Fine-Tuning LLM on Heterogeneous Resource-Constrained Devices | Jian Ma et.al. | 2503.18986 | null |
| 2025-03-24 | xKV: Cross-Layer SVD for KV-Cache Compression | Chi-Chih Chang et.al. | 2503.18893 | link |
| 2025-04-21 | Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design | Rui Xie et.al. | 2503.18869 | null |
| 2025-05-14 | Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization | Minsu Kim et.al. | 2503.18599 | null |
| 2025-03-24 | DeepFund: Will LLM be Professional at Fund Investment? A Live Arena Perspective | Changlun Li et.al. | 2503.18313 | null |
| 2025-03-24 | Jenga: Effective Memory Management for Serving LLM with Heterogeneity | Chen Zhang et.al. | 2503.18292 | null |
| 2025-03-27 | WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference | Youhui Zuo et.al. | 2503.17922 | link |
| 2025-03-22 | PipeBoost: Resilient Pipelined Architecture for Fast Serverless LLM Scaling | Chongpeng Liu et.al. | 2503.17707 | null |
| 2025-03-21 | V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms | Javier J. Poveda Rodrigo et.al. | 2503.17422 | null |
| 2025-03-21 | Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation | Jingzhi Fang et.al. | 2503.16893 | null |
| 2025-05-16 | KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse | Huan Yang et.al. | 2503.16525 | null |
| 2025-03-20 | SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models | Fahao Chen et.al. | 2503.15921 | null |
| 2025-03-19 | Automated Non-Functional Requirements Generation in Software Engineering with Large Language Models: A Comparative Study | Jomar Thomas Almonte et.al. | 2503.15248 | null |
| 2025-04-15 | ELTEX: A Framework for Domain-Driven Synthetic Data Generation | Arina Razmyslovich et.al. | 2503.15055 | link |
| 2025-03-19 | FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding | Chongjun Tu et.al. | 2503.14935 | null |
| 2025-03-19 | Communication-Efficient Distributed On-Device LLM Inference Over Wireless Networks | Kai Zhang et.al. | 2503.14882 | null |
| 2025-03-21 | RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving | Wenqi Jiang et.al. | 2503.14649 | null |
| 2025-03-18 | PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play | Wei Fang et.al. | 2503.14432 | null |
| 2025-03-24 | Mitigating KV Cache Competition to Enhance User Experience in LLM Inference | Haiying Shen et.al. | 2503.13773 | null |
| 2025-03-17 | AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications | Haiying Shen et.al. | 2503.13737 | null |
| 2025-03-17 | ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts | Evangelos Georganas et.al. | 2503.13565 | null |
| 2025-03-14 | Examples as the Prompt: A Scalable Approach for Efficient LLM Adaptation in E-Commerce | Jingying Zeng et.al. | 2503.13518 | null |
| 2025-03-17 | xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference | Maximilian Beck et.al. | 2503.13427 | link |
| 2025-04-14 | VeriLeaky: Navigating IP Protection vs Utility in Fine-Tuning for LLM-Driven Verilog Coding | Zeng Wang et.al. | 2503.13116 | null |
| 2025-03-15 | TFHE-Coder: Evaluating LLM-agentic Fully Homomorphic Encryption Code Generation | Mayank Kumar et.al. | 2503.12217 | null |
| 2025-04-22 | Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques | Neusha Javidnia et.al. | 2503.11816 | null |
| 2025-05-19 | D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning | Jia Zhang et.al. | 2503.11441 | null |
| 2025-03-14 | MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens | Jeong Hun Yeo et.al. | 2503.11315 | link |
| 2025-04-08 | Green Prompting | Marta Adamska et.al. | 2503.10666 | null |
| 2025-05-16 | Collaborative Speculative Inference for Efficient LLM Inference Serving | Luyao Gao et.al. | 2503.10325 | null |
| 2025-03-17 | Exploiting Edited Large Language Models as General Scientific Optimizers | Qitan Lv et.al. | 2503.09620 | null |
| 2025-03-13 | BIMBA: Selective-Scan Compression for Long-Range Video Question Answering | Md Mohaiminul Islam et.al. | 2503.09590 | link |
| 2025-05-23 | Prompt Inference Attack on Distributed Large Language Model Inference Frameworks | Xinjian Luo et.al. | 2503.09291 | null |
| 2025-05-02 | Prompt Inversion Attack against Collaborative Inference of Large Language Models | Wenjie Qu et.al. | 2503.09022 | null |
| 2025-03-19 | Enhancing High-Quality Code Generation in Large Language Models with Comparative Prefix-Tuning | Yuan Jiang et.al. | 2503.09020 | link |
| 2025-03-11 | Position-Aware Depth Decay Decoding ( $D^3$ ): Boosting Large Language Model Inference Efficiency | Siqi Fan et.al. | 2503.08524 | null |
| 2025-03-11 | FastCache: Optimizing Multimodal LLM Serving through Lightweight KV-Cache Compression Framework | Jianian Zhu et.al. | 2503.08461 | null |
| 2025-03-19 | TokenSim: Enabling Hardware and Software Exploration for Large Language Model Inference Systems | Feiyang Wu et.al. | 2503.08415 | link |
| 2025-03-11 | Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference | Pol G. Recasens et.al. | 2503.08311 | null |
| 2025-03-09 | Seesaw: High-throughput LLM Inference via Model Re-sharding | Qidong Su et.al. | 2503.06433 | null |
| 2025-02-24 | Encoding Inequity: Examining Demographic Bias in LLM-Driven Robot Caregiving | Raj Korpan et.al. | 2503.05765 | null |
| 2025-03-07 | Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching | Bowen Pang et.al. | 2503.05248 | link |
| 2025-05-21 | Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching | Simon A. Aytes et.al. | 2503.05179 | link |
| 2025-03-07 | SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding | Kaiyu Huang et.al. | 2503.05096 | null |
| 2025-03-07 | Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size | Alireza Behtash et.al. | 2503.04704 | null |
| 2025-03-15 | Mark Your LLM: Detecting the Misuse of Open-Source Large Language Models via Watermarking | Yijie Xu et.al. | 2503.04636 | null |
| 2025-03-06 | AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services | Xiaoqi Wang et.al. | 2503.04418 | null |
| 2025-03-06 | Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search | Kou Misaki et.al. | 2503.04412 | null |
| 2025-03-06 | ADOR: A Design Exploration Framework for LLM Serving with Enhanced Latency and Throughput | Junsoo Kim et.al. | 2503.04253 | null |
| 2025-03-06 | Beyond Memorization: Evaluating the True Type Inference Capabilities of LLMs for Java Code Snippets | Yiwen Dong et.al. | 2503.04076 | null |
| 2025-03-04 | FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference | Hongchao Du et.al. | 2503.03777 | null |
| 2025-03-05 | MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems | Rui Ye et.al. | 2503.03686 | null |
| 2025-03-05 | Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems | Yaoru Li et.al. | 2503.03505 | link |
| 2025-03-05 | Enhancing Memory Efficiency in Large Language Model Training Through Chronos-aware Pipeline Parallelism | Xinyuan Lin et.al. | 2503.03182 | null |
| 2025-03-04 | PersonaX: A Recommendation Agent Oriented User Modeling Framework for Long Behavior Sequence | Yunxiao Shi et.al. | 2503.02398 | link |
| 2025-03-04 | VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference | Zihan Liu et.al. | 2503.02236 | null |
| 2025-02-26 | Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis | Long Cheng et.al. | 2503.01873 | null |
| 2025-04-30 | SAGE: A Framework of Precise Retrieval for RAG | Jintao Zhang et.al. | 2503.01713 | null |
| 2025-03-03 | Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens | Xinsheng Wang et.al. | 2503.01710 | link |
| 2025-03-03 | DILEMMA: Joint LLM Quantization and Distributed LLM Inference Over Edge Computing Systems | Minoo Hosseinzadeh et.al. | 2503.01704 | null |
| 2025-03-15 | Towards An Efficient LLM Training Paradigm for CTR Prediction | Allen Lin et.al. | 2503.01001 | null |
| 2025-03-02 | Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers | Yiran Zhao et.al. | 2503.00865 | null |
| 2025-03-01 | Tutorial Proposal: Speculative Decoding for Efficient LLM Inference | Heming Xia et.al. | 2503.00491 | null |
| 2025-03-04 | Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving | Qihui Zhou et.al. | 2503.00392 | null |
| 2025-02-28 | FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference | Xunhao Lai et.al. | 2502.20766 | link |
| 2025-05-04 | SPD: Sync-Point Drop for efficient tensor parallelism of Large Language Models | Han-Byul Kim et.al. | 2502.20727 | null |
| 2025-04-02 | Smart Routing: Cost-Effective Multi-LLM Serving for Multi-Core AIOS | Kai Mei et.al. | 2502.20576 | link |
| 2025-02-27 | M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging | Jinghao Feng et.al. | 2502.20301 | null |
| 2025-02-26 | Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs | Yiheng Yang et.al. | 2502.19078 | null |
| 2025-02-26 | Evidence-Driven Marker Extraction for Social Media Suicide Risk Detection | Carter Adams et.al. | 2502.18823 | null |
| 2025-02-24 | LLM Inference Acceleration via Efficient Operation Fusion | Mahsa Salmani et.al. | 2502.17728 | null |
| 2025-02-24 | CodeSwift: Accelerating LLM Inference for Efficient Code Generation | Qianhui Zhao et.al. | 2502.17139 | null |
| 2025-02-24 | Make LLM Inference Affordable to Everyone: Augmenting GPU Memory with NDP-DIMM | Lian Liu et.al. | 2502.16963 | null |
| 2025-02-24 | DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance | Xuanfan Ni et.al. | 2502.16886 | null |
| 2025-03-01 | CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter | Yepeng Weng et.al. | 2502.16880 | null |
| 2025-02-23 | DISC: Dynamic Decomposition Improves LLM Inference Scaling | Jonathan Light et.al. | 2502.16706 | null |
| 2025-02-23 | Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines | Xinwei Long et.al. | 2502.16641 | null |
| 2025-05-01 | TerEffic: Highly Efficient Ternary LLM Inference on FPGA | Chenyang Yin et.al. | 2502.16473 | null |
| 2025-02-27 | Dynamic Parallel Tree Search for Efficient LLM Reasoning | Yifu Ding et.al. | 2502.16235 | null |
| 2025-02-21 | KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse | Jingbo Yang et.al. | 2502.16002 | link |
| 2025-02-14 | Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization | Bowen Pang et.al. | 2502.15763 | null |
| 2025-02-21 | Towards Swift Serverless LLM Cold Starts with ParaServe | Chiheng Lou et.al. | 2502.15524 | null |
| 2025-02-24 | HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings | Rasmus Aavang et.al. | 2502.15411 | link |
| 2025-02-24 | Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference | Yaohua Tang et.al. | 2502.15294 | null |
| 2025-02-21 | A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation | Shilong Hou et.al. | 2502.15233 | link |
| 2025-02-19 | EvoP: Robust LLM Inference via Evolutionary Pruning | Shangyu Wu et.al. | 2502.14910 | null |
| 2025-04-21 | LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention | Shang Yang et.al. | 2502.14866 | link |
| 2025-02-20 | Serving Models, Fast and Slow:Optimizing Heterogeneous LLM Inferencing Workloads at Scale | Shashwat Jaiswal et.al. | 2502.14617 | null |
| 2025-02-20 | SR-LLM: Rethinking the Structured Representation in Large Language Model | Jiahuan Zhang et.al. | 2502.14352 | null |
| 2025-02-20 | Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications | Kayhan Behdin et.al. | 2502.14305 | null |
| 2025-02-19 | RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression | Payman Behnam et.al. | 2502.14051 | null |
| 2025-02-19 | Autellix: An Efficient Serving Engine for LLM Agents as General Programs | Michael Luo et.al. | 2502.13965 | null |
| 2025-02-19 | Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference | Qingfa Xiao et.al. | 2502.13542 | null |
| 2025-02-19 | What are Models Thinking about? Understanding Large Language Model Hallucinations “Psychology” through Model Inner State Analysis | Peiran Wang et.al. | 2502.13490 | null |
| 2025-02-24 | BaKlaVa – Budgeted Allocation of KV cache for Long-context Inference | Ahmed Burak Gulhan et.al. | 2502.13176 | null |
| 2025-02-18 | SEFL: Harnessing Large Language Model Agents to Improve Educational Feedback Systems | Mike Zhang et.al. | 2502.12927 | link |
| 2025-03-27 | R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs | Sumin Jo et.al. | 2502.12767 | link |
| 2025-02-18 | HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading | Cheng Luo et.al. | 2502.12574 | link |
| 2025-02-18 | Distributed On-Device LLM Inference With Over-the-Air Computation | Kai Zhang et.al. | 2502.12559 | null |
| 2025-02-18 | SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs | Ahmed F. AbouElhamayed et.al. | 2502.12444 | link |
| 2025-02-17 | Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs | Kan Zhu et.al. | 2502.12216 | null |
| 2025-02-17 | Designing Role Vectors to Improve LLM Inference Behaviour | Daniele Potertì et.al. | 2502.12055 | null |
| 2025-02-17 | DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services | Ting Sun et.al. | 2502.11417 | null |
| 2025-02-17 | Evaluating the Performance of the DeepSeek Model in Confidential Computing Environment | Ben Dong et.al. | 2502.11347 | null |
| 2025-02-16 | Unveiling Environmental Impacts of Large Language Model Serving: A Functional Unit View | Yanran Wu et.al. | 2502.11256 | null |
| 2025-02-16 | Diversified Sampling Improves Scaling LLM inference | Tianchun Wang et.al. | 2502.11027 | null |
| 2025-02-16 | Leveraging Uncertainty Estimation for Efficient LLM Routing | Tuo Zhang et.al. | 2502.11021 | null |
| 2025-04-07 | Local-Cloud Inference Offloading for LLMs in Multi-Modal, Multi-Task, Multi-Dialogue Settings | Liangqi Yuan et.al. | 2502.11007 | link |
| 2025-02-15 | Pushing up to the Limit of Memory Bandwidth and Capacity Utilization for Efficient LLM Decoding on Embedded FPGA | Jindong Li et.al. | 2502.10659 | null |
| 2025-02-05 | QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache | Rishabh Tiwari et.al. | 2502.10424 | null |
| 2025-02-14 | λScale: Enabling Fast Scaling for Serverless Large Language Model Inference | Minchen Yu et.al. | 2502.09922 | null |
| 2025-02-14 | INF^2: High-Throughput Generative Inference of Large Language Models using Near-Storage Processing | Hongsun Jang et.al. | 2502.09921 | null |
| 2025-02-13 | On multi-token prediction for efficient LLM inference | Somesh Mehra et.al. | 2502.09419 | null |
| 2025-02-13 | ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments | Youhe Jiang et.al. | 2502.09334 | null |
| 2025-03-21 | RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models | Quan Wei et.al. | 2502.09003 | null |
| 2025-02-13 | InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU | Heejun Lee et.al. | 2502.08910 | null |
| 2025-02-13 | DiffoRA: Enabling Parameter-Efficient LLM Fine-Tuning via Differential Low-Rank Matrix Adaptation | Tangyu Jiang et.al. | 2502.08905 | null |
| 2025-02-12 | Universal Model Routing for Efficient LLM Inference | Wittawat Jitkrittum et.al. | 2502.08773 | null |
| 2025-02-12 | MoLoRec: A Generalizable and Efficient Framework for LLM-Based Recommendation | Min Hou et.al. | 2502.08271 | null |
| 2025-02-12 | Memory Offloading for Large Language Model Inference with Latency SLO Guarantees | Chenxiang Ma et.al. | 2502.08182 | null |
| 2025-02-12 | Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences | Shanshan Han et.al. | 2502.08142 | null |
| 2025-03-19 | Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding | Ziyao Wang et.al. | 2502.08020 | null |
| 2025-02-11 | HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment | Youhe Jiang et.al. | 2502.07903 | null |
| 2025-02-11 | SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters | Yiping Wang et.al. | 2502.07832 | null |
| 2025-03-21 | PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference | Yufeng Gu et.al. | 2502.07578 | link |
| 2025-03-05 | Online Scheduling for LLM Inference with KV Cache Constraints | Patrick Jaillet et.al. | 2502.07115 | null |
| 2025-02-10 | Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE | Haiduo Huang et.al. | 2502.06282 | link |
| 2025-03-15 | Towards Sustainable NLP: Insights from Benchmarking Inference Energy in Large Language Models | Soham Poddar et.al. | 2502.05610 | null |
| 2025-02-08 | Mechanistic Interpretability of Emotion Inference in Large Language Models | Ala N. Tak et.al. | 2502.05489 | null |
| 2025-02-07 | BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference | Reena Elangovan et.al. | 2502.05376 | null |
| 2025-01-31 | Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies | Nadav Timor et.al. | 2502.05202 | null |
| 2025-03-15 | EcoServe: Designing Carbon-Aware AI Inference Systems | Yueying Li et.al. | 2502.05043 | null |
| 2025-02-07 | LLM Query Scheduling with Prefix Reuse and Latency Constraints | Gregory Dexter et.al. | 2502.04677 | null |
| 2025-02-18 | WaferLLM: A Wafer-Scale LLM Inference System | Congjie He et.al. | 2502.04563 | null |
| 2025-02-25 | KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference | Xing Li et.al. | 2502.04420 | link |
| 2025-02-06 | CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference | Zehua Pei et.al. | 2502.04416 | link |
| 2025-02-11 | Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing | Kunfeng Lai et.al. | 2502.04411 | null |
| 2025-02-26 | AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference | Qingyue Yang et.al. | 2502.04077 | link |
| 2025-02-06 | CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing | Yu Yuan et.al. | 2502.03997 | null |
| 2025-02-06 | Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective | Yuan Feng et.al. | 2502.03805 | link |
| 2025-04-04 | Adaptive Semantic Prompt Caching with VectorQ | Luis Gaspar Schroeder et.al. | 2502.03771 | null |
| 2025-02-05 | Bilevel ZOFO: Bridging Parameter-Efficient and Zeroth-Order Techniques for Efficient LLM Fine-Tuning and Meta-Training | Reza Shirkavand et.al. | 2502.03604 | null |
| 2025-02-05 | HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference | Zeyu Zhang et.al. | 2502.03589 | null |
| 2025-02-05 | Accessible and Portable LLM Inference by Compiling Computational Graphs into SQL | Wenbo Sun et.al. | 2502.02818 | null |
| 2025-02-05 | Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation | Jingyu Liu et.al. | 2502.02789 | link |
| 2025-02-04 | LLM Bandit: Cost-Efficient LLM Generation via Preference-Conditioned Dynamic Routing | Yang Li et.al. | 2502.02743 | null |
| 2025-02-04 | EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization | Yize Wu et.al. | 2502.02493 | null |
| 2025-01-30 | Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency | Sazzad Hossain et.al. | 2502.01651 | null |
| 2025-02-06 | An Investigation of FP8 Across Accelerators for LLM Inference | Jiwoo Kim et.al. | 2502.01070 | null |
| 2025-02-02 | Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference | Patrick Yubeaton et.al. | 2502.00922 | null |
| 2025-02-02 | MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies | Ehsaneddin Asgari et.al. | 2502.00894 | null |
| 2025-02-02 | SecPE: Secure Prompt Ensembling for Private and Robust Large Language Models | Jiawen Zhang et.al. | 2502.00847 | null |
| 2025-02-02 | Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs | Youhe Jiang et.al. | 2502.00722 | null |
| 2025-02-13 | Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning | Zhi Zhou et.al. | 2502.00511 | null |
| 2025-02-01 | UniAttn: Reducing Inference Costs via Softmax Unification for Post-Training LLMs | Yizhe Xiong et.al. | 2502.00439 | null |
| 2025-02-01 | ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference | Xiang Liu et.al. | 2502.00299 | null |
| 2025-01-16 | Optimization Strategies for Enhancing Resource Efficiency in Transformers & Large Language Models | Tom Wallace et.al. | 2502.00046 | null |
| 2025-02-07 | Pushing the Limits of BFP on Narrow Precision LLM Inference | Hui Wang et.al. | 2502.00026 | null |
| 2025-02-14 | Reward-Guided Speculative Decoding for Efficient LLM Reasoning | Baohao Liao et.al. | 2501.19324 | null |
| 2025-01-31 | Pheromone-based Learning of Optimal Reasoning Paths | Anirudh Chari et.al. | 2501.19278 | null |
| 2025-01-31 | Structural Embedding Projection for Contextual Large Language Model Inference | Vincent Enoasmo et.al. | 2501.18826 | null |
| 2025-01-29 | On the Partitioning of GPU Power among Multi-Instances | Tirth Vamja et.al. | 2501.17752 | null |
| 2025-02-02 | RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations | Zunhai Su et.al. | 2501.16383 | null |
| 2025-01-27 | Raiders of the Lost Dependency: Fixing Dependency Conflicts in Python using LLMs | Antony Bartlett et.al. | 2501.16191 | null |
| 2025-01-27 | TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference | Jack Min Ong et.al. | 2501.16007 | null |
| 2025-01-27 | Aging-aware CPU Core Management for Embodied Carbon Amortization in Cloud LLM Inference | Tharindu B. Hewage et.al. | 2501.15829 | link |
| 2025-01-25 | Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads | Xingyang He et.al. | 2501.15113 | null |
| 2025-01-25 | PatchRec: Multi-Grained Patching for Efficient LLM-based Sequential Recommendation | Jiayi Liao et.al. | 2501.15087 | null |
| 2025-02-09 | HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location | Ting Sun et.al. | 2501.14808 | null |
| 2025-01-11 | HeteroLLM: Accelerating Large Language Model Inference on Mobile SoCs platform with Heterogeneous AI Accelerators | Le Chen et.al. | 2501.14794 | null |
| 2025-01-04 | DeServe: Towards Affordable Offline LLM Inference via Decentralization | Linyu Wu et.al. | 2501.14784 | null |
| 2024-12-13 | KVDirect: Distributed Disaggregated LLM Inference | Shiyang Chen et.al. | 2501.14743 | null |
| 2025-01-24 | Accelerated Preference Elicitation with LLM-Based Proxies | David Huang et.al. | 2501.14625 | null |
| 2025-01-27 | DeepFlow: Serverless Large Language Model Serving at Scale | Junhao Hu et.al. | 2501.14417 | null |
| 2025-01-24 | Locality-aware Fair Scheduling in LLM Serving | Shiyi Cao et.al. | 2501.14312 | null |
| 2025-01-27 | Serving Long-Context LLMs at the Mobile Edge: Test-Time Reinforcement Learning-based Model Caching and Inference Offloading | Minrui Xu et.al. | 2501.14205 | null |
| 2025-01-08 | iServe: An Intent-based Serving System for LLMs | Dimitrios Liakopoulos et.al. | 2501.13111 | null |
| 2025-01-24 | EchoLM: Accelerating LLM Serving with Real-time Knowledge Distillation | Yifan Yu et.al. | 2501.12689 | null |
| 2025-03-16 | Human-like conceptual representations emerge from language prediction | Ningyu Xu et.al. | 2501.12547 | null |
| 2025-01-21 | AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding | Zikun Li et.al. | 2501.12162 | null |
| 2025-02-11 | Glinthawk: A Two-Tiered Architecture for Offline LLM Inference | Pouya Hamadanian et.al. | 2501.11779 | link |
| 2025-01-20 | Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas | Nishant Balepur et.al. | 2501.11549 | link |
| 2025-03-21 | GREEN-CODE: Learning to Optimize Energy Efficiency in LLM-based Code Generation | Shashikant Ilager et.al. | 2501.11006 | link |
| 2025-03-06 | A Survey on LLM Test-Time Compute via Search: Tasks, LLM Profiling, Search Algorithms, and Relevant Frameworks | Xinzhe Li et.al. | 2501.10069 | link |
| 2025-01-16 | PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks | Huiyou Zhan et.al. | 2501.09367 | null |
| 2025-01-16 | Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition | Takaaki Hori et.al. | 2501.09258 | null |
| 2025-01-16 | Split Fine-Tuning for Large Language Models in Wireless Networks | Songge Zhang et.al. | 2501.09237 | null |
| 2025-01-15 | Guiding Retrieval using LLM-based Listwise Rankers | Mandeep Rathee et.al. | 2501.09186 | link |
| 2025-01-14 | Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings | Paul Joe Maliakel et.al. | 2501.08219 | null |
| 2025-01-14 | PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving | Ahmet Caner Yüzügüler et.al. | 2501.08192 | null |
| 2025-01-14 | Hierarchical Autoscaling for Large Language Model Serving with Chiron | Archit Patke et.al. | 2501.08090 | null |
| 2025-01-12 | MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference | Wenxuan Zeng et.al. | 2501.06807 | null |
| 2025-01-12 | Mell: Memory-Efficient Large Language Model Serving via Multi-GPU KV Cache Management | Liu Qianli et.al. | 2501.06709 | null |
| 2025-02-07 | Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping | Muru Zhang et.al. | 2501.06589 | link |
| 2025-01-15 | Multimodal-to-Text Prompt Engineering in Large Language Models Using Feature Embeddings for GNSS Interference Characterization | Harshith Manjunath et.al. | 2501.05079 | null |
| 2025-02-08 | Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text | Ali Al-Lawati et.al. | 2501.03166 | link |
| 2025-01-05 | TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms | Jovan Stojkovic et.al. | 2501.02600 | null |
| 2025-01-04 | AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference | Zhuomin He et.al. | 2501.02336 | link |
| 2024-12-31 | Towards Sustainable Large Language Model Serving | Sophia Nguyen et.al. | 2501.01990 | null |
| 2025-01-03 | Efficient LLM Inference with Activation Checkpointing and Hybrid Caching | Sanghyeon Lee et.al. | 2501.01792 | null |
| 2025-01-03 | (WhyPHI) Fine-Tuning PHI-3 for Multiple-Choice Question Answering: Methodology, Results, and Challenges | Mohamed Hisham Abdellatif et.al. | 2501.01588 | null |
| 2025-01-21 | BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference | Wonsuk Jang et.al. | 2501.01144 | link |
| 2025-04-23 | FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving | Zihao Ye et.al. | 2501.01005 | null |
| 2025-02-25 | Rethinking Layer Removal: A Hybrid Pruning Framework Combining Layer Removal and Singular Value Selection for Efficient LLM Compression | Kainan Liu et.al. | 2501.00339 | null |
| 2024-12-23 | Highly Optimized Kernels and Fine-Grained Codebooks for LLM Inference on Arm CPUs | Dibakar Gope et.al. | 2501.00032 | link |
| 2024-12-29 | TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication | Zongwu Wang et.al. | 2412.20501 | link |
| 2024-12-29 | GreenLLM: Disaggregating Large Language Model Serving on Heterogeneous GPUs for Lower Carbon Emissions | Tianyao Shi et.al. | 2412.20322 | null |
| 2025-01-15 | LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System | Hyucksung Kwon et.al. | 2412.20166 | null |
| 2024-12-19 | GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors | Chengming Zhang et.al. | 2412.19829 | null |
| 2025-01-05 | Gradient Weight-normalized Low-rank Projection for Efficient LLM Training | Jia-Hong Huang et.al. | 2412.19616 | link |
| 2025-01-02 | A Survey on Large Language Model Acceleration based on KV Cache Management | Haoyang Li et.al. | 2412.19442 | link |
| 2025-02-13 | An Engorgio Prompt Makes Large Language Model Babble on | Jianshuo Dong et.al. | 2412.19394 | link |
| 2024-12-25 | Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference | Libo Zhang et.al. | 2412.18934 | null |
| 2024-12-24 | TimelyLLM: Segmented LLM Serving System for Time-sensitive Robotic Applications | Neiwen Ling et.al. | 2412.18695 | null |
| 2024-12-26 | KunServe: Elastic and Efficient Large Language Model Serving with Parameter-centric Memory Management | Rongxin Cheng et.al. | 2412.18169 | null |
| 2025-02-22 | Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media | Zhen Sun et.al. | 2412.18148 | null |
| 2024-12-24 | Tackling the Dynamicity in a Production LLM Serving System with SOTA Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient Meta-kernels | Mingcong Song et.al. | 2412.18106 | null |
| 2024-12-23 | Trustworthy and Efficient LLMs Meet Databases | Kyoungmin Kim et.al. | 2412.18022 | null |
| 2025-02-20 | GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference | Chao Zeng et.al. | 2412.17560 | null |
| 2025-02-18 | VilBias: A Study of Bias Detection through Linguistic and Visual Cues , presenting Annotation Strategies, Evaluation, and Key Challenges | Shaina Raza et.al. | 2412.17052 | link |
| 2024-12-21 | SYMPHONY: Improving Memory Management for LLM Inference Workloads | Saurabh Agarwal et.al. | 2412.16434 | null |
| 2024-12-20 | WebLLM: A High-Performance In-Browser LLM Inference Engine | Charlie F. Ruan et.al. | 2412.15803 | link |
| 2024-12-19 | Fietje: An open, efficient LLM for Dutch | Bram Vanroy et.al. | 2412.15450 | link |
| 2024-12-19 | PA-RAG: RAG Alignment via Multi-Perspective Preference Optimization | Jiayi Wu et.al. | 2412.14510 | link |
| 2024-12-19 | Are Longer Prompts Always Better? Prompt Selection in Large Language Models for Recommendation Systems | Genki Kusano et.al. | 2412.14454 | null |
| 2024-12-18 | A Survey on LLM Inference-Time Self-Improvement | Xiangjue Dong et.al. | 2412.14352 | link |
| 2024-12-18 | Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models | Seungeun Oh et.al. | 2412.12687 | null |
| 2024-12-17 | A System for Microserving of LLMs | Hongyi Jin et.al. | 2412.12488 | null |
| 2024-12-17 | LITA: An Efficient LLM-assisted Iterative Topic Augmentation Framework | Chia-Hsuan Chang et.al. | 2412.12459 | null |
| 2024-12-16 | CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation | Hongxuan Zhang et.al. | 2412.11741 | null |
| 2025-01-20 | FinLoRA: Finetuning Quantized Financial Large Language Models Using Low-Rank Adaptation | Dannong Wang et.al. | 2412.11378 | null |
| 2025-01-09 | Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning | Yun Qu et.al. | 2412.11120 | link |
| 2024-12-15 | NITRO: LLM Inference on Intel Laptop NPUs | Anthony Fei et.al. | 2412.11053 | link |
| 2025-03-11 | SCBench: A KV Cache-Centric Analysis of Long-Context Methods | Yucheng Li et.al. | 2412.10319 | null |
| 2024-12-17 | TurboAttention: Efficient Attention Approximation For High Throughputs LLMs | Hao Kang et.al. | 2412.08585 | null |
| 2024-12-11 | Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths | Naryeong Kim et.al. | 2412.08281 | null |
| 2024-12-12 | TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch | Xingchen Song et.al. | 2412.08237 | null |
| 2024-12-09 | Asynchronous LLM Function Calling | In Gim et.al. | 2412.07017 | null |
| 2024-12-08 | Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization | Dongwei Wang et.al. | 2412.06858 | null |
| 2024-12-09 | JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM | Takuro Fujii et.al. | 2412.06738 | link |
| 2024-12-09 | SparseAccelerate: Efficient Long-Context Inference for Mid-Range GPUs | James Vo et.al. | 2412.06198 | null |
| 2024-12-08 | XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference | Weizhuo Li et.al. | 2412.05896 | null |
| 2025-02-17 | APOLLO: SGD-like Memory, AdamW-level Performance | Hanqing Zhu et.al. | 2412.05270 | link |
| 2024-12-06 | Can Large Language Models Serve as Effective Classifiers for Hierarchical Multi-Label Classification of Scientific Documents at Industrial Scale? | Seyed Amin Tabatabaei et.al. | 2412.05137 | null |
| 2024-12-11 | Flash Communication: Reducing Tensor Parallelization Bottleneck for Fast Large Language Model Inference | Qingyuan Li et.al. | 2412.04964 | null |
| 2025-01-26 | GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments | Yanyu Chen et.al. | 2412.04788 | null |
| 2024-12-09 | Towards Performance-Aware Allocation for Accelerated Machine Learning on GPU-SSD Systems | Ayush Gundawar et.al. | 2412.04569 | link |
| 2024-12-03 | Multi-Bin Batching for Increasing LLM Inference Throughput | Ozgur Guldogan et.al. | 2412.04504 | null |
| 2025-01-17 | BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching | Zhen Zheng et.al. | 2412.03594 | null |
| 2024-12-04 | Unifying KV Cache Compression for Large Language Models with LeanKV | Yanqi Zhang et.al. | 2412.03131 | null |
| 2024-12-03 | Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity | Da Ma et.al. | 2412.02252 | null |
| 2024-12-02 | Data-Centric and Heterogeneity-Adaptive Sequence Parallelism for Efficient LLM Training | Yujie Wang et.al. | 2412.01523 | null |
| 2024-12-02 | PLD+: Accelerating LLM inference by leveraging Language Model Artifacts | Shwetha Somasundaram et.al. | 2412.01447 | null |
| 2024-12-02 | Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking | Marco Federici et.al. | 2412.01380 | null |
| 2024-12-02 | Can Large Language Models Serve as Evaluators for Code Summarization? | Yang Wu et.al. | 2412.01333 | link |
| 2024-12-05 | RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy | Geonho Lee et.al. | 2412.01129 | null |
| 2024-12-02 | TruncFormer: Private LLM Inference Using Only Truncations | Patrick Yubeaton et.al. | 2412.01042 | null |
| 2024-11-25 | Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration | Zhuofan Wen et.al. | 2412.00061 | null |
| 2024-11-29 | A dynamic parallel method for performance optimization on hybrid CPUs | Luo Yu et.al. | 2411.19542 | null |
| 2024-12-04 | Marconi: Prefix Caching for the Era of Hybrid LLMs | Rui Pan et.al. | 2411.19379 | null |
| 2024-12-08 | Puzzle: Distillation-Based NAS for Inference-Optimized LLMs | Akhiad Bercovich et.al. | 2411.19146 | null |
| 2024-11-27 | FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving | Ao Shen et.al. | 2411.18424 | null |
| 2024-11-29 | InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks | Xinyao Zheng et.al. | 2411.18191 | null |
| 2024-11-28 | MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache | Akshat Sharma et.al. | 2411.18077 | null |
| 2024-11-24 | Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments | Nikoleta Iliakopoulou et.al. | 2411.17741 | null |
| 2024-11-18 | Generative AI on the Edge: Architecture and Performance Evaluation | Zeinab Nezami et.al. | 2411.17712 | null |
| 2024-11-26 | Toward High-Performance LLM Serving: A Simulation-Based Approach for Identifying Optimal Parallelism | Yi-Chien Lin et.al. | 2411.17651 | null |
| 2024-11-26 | PIM-AI: A Novel Architecture for High-Efficiency LLM Inference | Cristobal Ortega et.al. | 2411.17309 | null |
| 2024-11-26 | Star Attention: Efficient LLM Inference over Long Sequences | Shantanu Acharya et.al. | 2411.17116 | link |
| 2024-11-26 | Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation | Chaoyi Jiang et.al. | 2411.17089 | null |
| 2024-11-25 | MixPE: Quantization and Hardware Co-design for Efficient LLM Inference | Yu Zhang et.al. | 2411.16158 | null |
| 2024-11-24 | eFedLLM: Efficient LLM Inference Based on Federated Learning | Shengwen Ding et.al. | 2411.16003 | null |
| 2024-11-24 | Ensuring Fair LLM Serving Amid Diverse Applications | Redwan Ibne Seraj Khan et.al. | 2411.15997 | null |
| 2024-11-24 | Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format | Chao Fang et.al. | 2411.15982 | null |
| 2024-11-24 | Task Scheduling for Efficient Inference of Large Language Models on Single Moderate GPU Systems | Wenxiang Lin et.al. | 2411.15715 | null |
| 2024-11-26 | Enabling Efficient Serverless Inference Serving for LLM (Large Language Model) in the Cloud | Himel Ghosh et.al. | 2411.15664 | null |
| 2025-01-14 | AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution | Fengyuan Liu et.al. | 2411.15102 | link |
| 2024-11-27 | XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models | Yixin Dong et.al. | 2411.15100 | null |
| 2024-11-02 | Transforming Engineering Education Using Generative AI and Digital Twin Technologies | Yu-Zheng Lin et.al. | 2411.14433 | null |
| 2024-11-21 | InstCache: A Predictive Cache for LLM Serving | Longwei Zou et.al. | 2411.13820 | null |
| 2024-11-21 | Disentangling Memory and Reasoning Ability in Large Language Models | Mingyu Jin et.al. | 2411.13504 | link |
| 2024-11-27 | Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding | Hyun Ryu et.al. | 2411.13157 | null |
| 2024-11-21 | LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts | Zhuohan Gu et.al. | 2411.13009 | null |
| 2024-11-15 | An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2 | Pepijn de Reus et.al. | 2411.12758 | link |
| 2025-01-24 | SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference | Jiho Shin et.al. | 2411.12692 | null |
| 2024-11-18 | BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration | Yuzong Chen et.al. | 2411.11745 | link |
| 2024-11-18 | MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs | Shiyi Cao et.al. | 2411.11217 | null |
| 2024-11-17 | FastDraft: How to Train Your Draft | Ofir Zafrir et.al. | 2411.11055 | null |
| 2024-12-16 | SAM Decoding: Speculative Decoding via Suffix Automaton | Yuxuan Hu et.al. | 2411.10666 | link |
| 2024-11-15 | Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity | Zichen Song et.al. | 2411.10069 | null |
| 2024-11-15 | AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference | Janghwan Lee et.al. | 2411.09909 | null |
| 2024-11-23 | Squeezed Attention: Accelerating Long Context Length LLM Inference | Coleman Hooper et.al. | 2411.09688 | link |
| 2024-11-15 | Communication Compression for Tensor Parallel LLM Inference | Jan Hansen-Palmus et.al. | 2411.09510 | null |
| 2024-11-14 | Pie: Pooling CPU Memory for LLM Inference | Yi Xu et.al. | 2411.09317 | null |
| 2025-01-23 | Reducing Reasoning Costs: The Path of Optimization for Chain of Thought via Sparse Attention Mechanism | Libo Wang et.al. | 2411.09111 | link |
| 2024-11-12 | Towards Low-bit Communication for Tensor Parallel LLM Inference | Harry Dong et.al. | 2411.07942 | null |
| 2024-12-12 | ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization | Weibo Zhao et.al. | 2411.07762 | null |
| 2025-01-08 | BudgetMLAgent: A Cost-Effective LLM Multi-Agent system for Automating Machine Learning Tasks | Shubham Gandhi et.al. | 2411.07464 | null |
| 2024-11-19 | The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving | Kyoungmin Kim et.al. | 2411.07447 | null |
| 2024-11-10 | EcoServe: Maximizing Multi-Resource Utilization with SLO Guarantees in LLM Serving | Haiying Shen et.al. | 2411.06364 | null |
| 2024-11-08 | SSSD: Simply-Scalable Speculative Decoding | Michele Marzollo et.al. | 2411.05894 | null |
| 2024-11-08 | AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality | Ilias Bournias et.al. | 2411.05555 | null |
| 2024-11-07 | Hardware and Software Platform Inference | Cheng Zhang et.al. | 2411.05197 | null |
| 2024-10-22 | Scattered Forest Search: Smarter Code Space Exploration with LLMs | Jonathan Light et.al. | 2411.05010 | null |
| 2024-11-07 | SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference | Gabriele Oliaro et.al. | 2411.04975 | null |
| 2024-11-05 | CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration | Hongpeng Jin et.al. | 2411.02829 | null |
| 2024-12-19 | DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving | Yuhan Liu et.al. | 2411.02820 | null |
| 2024-11-10 | Context Parallelism for Scalable Million-Token Inference | Amy Yang et.al. | 2411.01783 | null |
| 2024-11-04 | RAGViz: Diagnose and Visualize Retrieval-Augmented Generation | Tevin Wang et.al. | 2411.01751 | link |
| 2024-11-03 | Autoformulation of Mathematical Optimization Models Using LLMs | Nicolás Astorga et.al. | 2411.01679 | null |
| 2024-11-06 | HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference | Peng Tang et.al. | 2411.01433 | null |
| 2024-11-02 | RA-WEBs: Remote Attestation for WEB services | Kosei Akama et.al. | 2411.01340 | null |
| 2024-11-02 | NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference | Xuanlin Jiang et.al. | 2411.01142 | null |
| 2024-10-30 | A Theoretical Perspective for Speculative Decoding Algorithm | Ming Yin et.al. | 2411.00841 | null |
| 2024-11-01 | Phase Diagram of Vision Large Language Models Inference: A Perspective from Interaction across Image and Instruction | Houjing Wei et.al. | 2411.00646 | null |
| 2024-11-01 | LLM-Based Misconfiguration Detection for AWS Serverless Computing | Jinfeng Wen et.al. | 2411.00642 | null |
| 2024-12-08 | ReverseNER: A Self-Generated Example-Driven Framework for Zero-Shot Named Entity Recognition with Large Language Models | Anbang Wang et.al. | 2411.00533 | null |
| 2024-11-01 | Attention Tracker: Detecting Prompt Injection Attacks in LLMs | Kuo-Han Hung et.al. | 2411.00348 | null |
| 2024-10-31 | LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators | Krishna Teja Chitty-Venkata et.al. | 2411.00136 | link |
| 2024-10-31 | Interpretable Language Modeling via Induction-head Ngram Models | Eunji Kim et.al. | 2411.00066 | link |
| 2024-10-31 | ALISE: Accelerating Large Language Model Serving with Speculative Scheduling | Youpeng Zhao et.al. | 2410.23537 | null |
| 2024-10-30 | BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference | Junqi Zhao et.al. | 2410.23079 | link |
| 2024-10-29 | Scaling LLM Inference with Optimized Sample Compute Allocation | Kexun Zhang et.al. | 2410.22480 | link |
| 2024-10-29 | SVIP: Towards Verifiable Inference of Open-source Large Language Models | Yifan Sun et.al. | 2410.22307 | null |
| 2025-02-08 | ProMoE: Fast MoE-based LLM Serving using Proactive Caching | Xiaoniu Song et.al. | 2410.22134 | null |
| 2025-01-21 | MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression | Noel Elias et.al. | 2410.21548 | link |
| 2025-04-29 | ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference | Hanshi Sun et.al. | 2410.21465 | null |
| 2024-10-27 | FIRP: Faster LLM inference via future intermediate representation prediction | Pengfei Wu et.al. | 2410.20488 | null |
| 2024-10-29 | Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management | Tuowei Wang et.al. | 2410.19274 | null |
| 2024-10-24 | Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design | Ruisi Cai et.al. | 2410.19123 | link |
| 2024-10-30 | Dynamic Vocabulary Pruning in Early-Exit LLMs | Jort Vincenti et.al. | 2410.18952 | link |
| 2024-10-25 | A Survey on Speech Large Language Models | Jing Peng et.al. | 2410.18908 | null |
| 2024-10-24 | A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs | Ankit Singh Rawat et.al. | 2410.18779 | null |
| 2024-10-24 | BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching | Peizhuang Cong et.al. | 2410.18701 | null |
| 2024-10-23 | CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation | Qinsi Wang et.al. | 2410.18311 | null |
| 2024-10-25 | Fast Inference for Augmented Large Language Models | Rana Shahout et.al. | 2410.18248 | null |
| 2024-10-23 | POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference | Aditya K Kamath et.al. | 2410.18038 | null |
| 2024-12-29 | AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning | Yehonathan Refael et.al. | 2410.17881 | null |
| 2024-10-22 | FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs | Haoran Lin et.al. | 2410.16663 | null |
| 2024-10-22 | Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency | Prafulla Kumar Choubey et.al. | 2410.16597 | null |
| 2024-12-18 | MagicPIG: LSH Sampling for Efficient LLM Generation | Zhuoming Chen et.al. | 2410.16179 | link |
| 2024-10-21 | Natural GaLore: Accelerating GaLore for memory-efficient LLM Training and Fine-tuning | Arijit Das et.al. | 2410.16029 | link |
| 2024-10-21 | RAC: Efficient LLM Factuality Correction with Retrieval Augmentation | Changmao Li et.al. | 2410.15667 | link |
| 2024-10-21 | Bayesian Concept Bottleneck Models with LLM Priors | Jean Feng et.al. | 2410.15555 | link |
| 2024-10-20 | CompAct: Compressed Activations for Memory-Efficient LLM Training | Yara Shamshoum et.al. | 2410.15352 | null |
| 2024-10-20 | EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models | Junhao Hu et.al. | 2410.15332 | null |
| 2024-10-19 | IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System | Minseok Seo et.al. | 2410.15008 | null |
| 2024-10-23 | Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching | Jie Peng et.al. | 2410.14740 | null |
| 2024-10-18 | A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference | You Wu et.al. | 2410.14442 | link |
| 2024-10-18 | Revisiting SLO and Goodput Metrics in LLM Serving | Zhibin Wang et.al. | 2410.14257 | null |
| 2024-10-18 | Leveraging Large Language Models for Enhancing Public Transit Services | Jiahao Wang et.al. | 2410.14147 | null |
| 2024-10-17 | RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs | Jiatan Huang et.al. | 2410.13987 | null |
| 2024-11-07 | Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs | Tianyu Guo et.al. | 2410.13835 | link |
| 2024-10-17 | Progressive Mixed-Precision Decoding for Efficient LLM Inference | Hao Mark Chen et.al. | 2410.13461 | null |
| 2024-10-17 | Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning | Minseok Choi et.al. | 2410.13274 | null |
| 2024-10-17 | Data Defenses Against Large Language Models | William Agnew et.al. | 2410.13138 | link |
| 2024-10-19 | In-context KV-Cache Eviction for LLMs via Attention-Gate | Zihao Zeng et.al. | 2410.12876 | null |
| 2024-10-10 | RecurFormer: Not All Transformer Heads Need Self-Attention | Ruiqing Yan et.al. | 2410.12850 | null |
| 2024-10-16 | COMET: Towards Partical W4A4KV4 LLMs Serving | Lian Liu et.al. | 2410.12168 | null |
| 2024-10-16 | Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning | Huiwen Wu et.al. | 2410.12130 | null |
| 2024-10-15 | Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix | Yingyu Liang et.al. | 2410.11261 | null |
| 2024-10-06 | Continuous Approximations for Improving Quantization Aware Training of LLMs | He Li et.al. | 2410.10849 | null |
| 2024-10-14 | DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads | Guangxuan Xiao et.al. | 2410.10819 | link |
| 2024-10-16 | SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization | Akrit Mudvari et.al. | 2410.10759 | null |
| 2024-10-12 | Power-Softmax: Towards Secure LLM Inference over Encrypted Data | Itamar Zimerman et.al. | 2410.09457 | null |
| 2024-10-11 | Large Language Models for Energy-Efficient Code: Emerging Results and Future Directions | Huiyun Peng et.al. | 2410.09241 | null |
| 2024-10-11 | SubZero: Random Subspace Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning | Ziming Yu et.al. | 2410.08989 | link |
| 2024-12-03 | HyperDPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework | Yinuo Ren et.al. | 2410.08316 | null |
| 2024-10-14 | Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining | Tianyi Bai et.al. | 2410.08102 | link |
| 2024-10-09 | SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration | Heming Xia et.al. | 2410.06916 | link |
| 2024-10-08 | Active Evaluation Acquisition for Efficient LLM Benchmarking | Yang Li et.al. | 2410.05952 | null |
| 2024-10-08 | Exploring the Meaningfulness of Nearest Neighbor Search in High-Dimensional Space | Zhonghan Chen et.al. | 2410.05752 | null |
| 2024-10-08 | ParallelSpec: Parallel Drafter for Efficient Speculative Decoding | Zilin Xiao et.al. | 2410.05589 | null |
| 2024-10-07 | Fast State Restoration in LLM Serving with HCache | Shiwei Gao et.al. | 2410.05004 | null |
| 2024-10-06 | RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference | Yige Xu et.al. | 2410.04519 | link |
| 2025-01-23 | Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective | Jinhao Li et.al. | 2410.04466 | null |
| 2024-12-05 | SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation | Aurick Qiao et.al. | 2410.03960 | null |
| 2024-10-04 | LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity | Selim Furkan Tekin et.al. | 2410.03953 | link |
| 2024-10-04 | EXAQ: Exponent Aware Quantization For LLMs Acceleration | Moran Shkolnik et.al. | 2410.03185 | link |
| 2024-10-04 | UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inference | Jing Xiong et.al. | 2410.03090 | null |
| 2024-10-03 | LLMCO2: Advancing Accurate Carbon Footprint Prediction for LLM Inferences | Zhenxiao Fu et.al. | 2410.02950 | null |
| 2024-10-03 | Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration | Yun Qu et.al. | 2410.02511 | link |
| 2024-10-03 | LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services | Małgorzata Łazuka et.al. | 2410.02425 | link |
| 2024-10-04 | Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation | Xiaoqun Liu et.al. | 2410.02220 | null |
| 2024-10-05 | Aligning with Logic: Measuring, Evaluating and Improving Logical Consistency in Large Language Models | Yinhong Liu et.al. | 2410.02205 | null |
| 2024-10-02 | Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads | Yuxiang Huang et.al. | 2410.01805 | link |
| 2024-10-02 | ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving | Yifan Qiao et.al. | 2410.01228 | null |
| 2024-10-01 | TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices | Zonghang Li et.al. | 2410.00531 | link |
| 2024-10-09 | LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management | Yi Xiong et.al. | 2410.00428 | null |
| 2024-11-06 | The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems | Linke Song et.al. | 2409.20002 | null |
| 2024-09-28 | SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models | Yi Wu et.al. | 2409.19471 | null |
| 2024-11-28 | Confidential Prompting: Protecting User Prompts from Cloud LLM Providers | In Gim et.al. | 2409.19134 | link |
| 2024-09-26 | Control Industrial Automation System with Large Language Models | Yuchen Xia et.al. | 2409.18009 | link |
| 2024-10-18 | Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores | Shaobo Ma et.al. | 2409.17870 | null |
| 2024-09-25 | Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction | Zhenmei Shi et.al. | 2409.17422 | link |
| 2025-06-23 | Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations | Amey Agrawal et.al. | 2409.17264 | null |
| 2024-09-25 | Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale | Fan Zhou et.al. | 2409.17115 | link |
| 2024-09-25 | Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference | Zongyue Qin et.al. | 2409.16560 | null |
| 2024-10-21 | AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization | Yifan Tan et.al. | 2409.16546 | link |
| 2024-11-07 | Enabling Efficient On-Device Fine-Tuning of LLMs Using Only Inference Engines | Lei Gao et.al. | 2409.15520 | link |
| 2024-10-29 | Eagle: Efficient Training-Free Router for Multi-LLM Inference | Zesen Zhao et.al. | 2409.15518 | null |
| 2024-10-03 | Archon: An Architecture Search Framework for Inference-Time Techniques | Jon Saad-Falcon et.al. | 2409.15254 | link |
| 2024-09-23 | CSPS: A Communication-Efficient Sequence-Parallelism based Serving System for Transformer based Models with Long Prompts | Zeyu Zhang et.al. | 2409.15104 | null |
| 2024-09-25 | UELLM: A Unified and Efficient Approach for LLM Inference Serving | Yiyuan He et.al. | 2409.14961 | null |
| 2024-11-01 | RACOON: An LLM-based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph | Lindsey Linxi Wei et.al. | 2409.14556 | null |
| 2024-09-21 | Practically implementing an LLM-supported collaborative vulnerability remediation process: a team-based approach | Xiaoqing Wang et.al. | 2409.14058 | null |
| 2024-10-21 | Do Large Language Models Need a Content Delivery Network? | Yihua Cheng et.al. | 2409.13761 | link |
| 2024-09-19 | PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs) | Mahmoud Nazzal et.al. | 2409.12699 | link |
| 2024-09-12 | LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs | Han Xu et.al. | 2409.11424 | null |
| 2024-09-04 | ISO: Overlap of Computation and Communication within Seqenence For LLM Inference | Bin Xiao et.al. | 2409.11155 | null |
| 2024-12-31 | RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval | Di Liu et.al. | 2409.10516 | link |
| 2024-09-12 | Enabling Cost-Effective UI Automation Testing with Retrieval-Based LLMs: A Case Study in WeChat | Sidong Feng et.al. | 2409.07829 | null |
| 2024-09-13 | LLM-Enhanced Software Patch Localization | Jinhong Yu et.al. | 2409.06816 | null |
| 2024-09-24 | OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language Models | Jahyun Koo et.al. | 2409.05902 | null |
| 2024-09-08 | InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference | Xiurui Pan et.al. | 2409.04992 | null |
| 2024-09-07 | Achieving Peak Performance for Large Language Models: A Systematic Review | Zhyar Rzgar K Rostam et.al. | 2409.04833 | null |
| 2024-09-06 | Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance | Guanyu Lin et.al. | 2409.04593 | null |
| 2024-09-06 | A First Look At Efficient And Secure On-Device LLM Inference Against KV Leakage | Huan Yang et.al. | 2409.04040 | null |
| 2024-11-05 | Confidential Computing on NVIDIA Hopper GPUs: A Performance Benchmark Study | Jianwei Zhu et.al. | 2409.03992 | null |
| 2024-09-05 | Sirius: Contextual Sparsity with Correction for Efficient LLMs | Yang Zhou et.al. | 2409.03856 | link |
| 2024-08-31 | HSF: Defending against Jailbreak Attacks with Hidden State Filtering | Cheng Qian et.al. | 2409.03788 | null |
| 2024-12-11 | Efficient Large Foundation Model Inference: A Perspective From Model and System Co-Design | Dong Liu et.al. | 2409.01990 | null |
| 2024-09-03 | Efficient LLM Context Distillation | Rajesh Upadhayayaya et.al. | 2409.01930 | null |
| 2024-09-03 | Laser: Parameter-Efficient LLM Bi-Tuning for Sequential Recommendation with Collaborative Information | Xinyu Zhang et.al. | 2409.01605 | null |
| 2024-09-02 | CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification | Junhui He et.al. | 2409.01366 | null |
| 2024-12-18 | Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference | Barys Liskavets et.al. | 2409.01227 | null |
| 2024-09-01 | Research on LLM Acceleration Using the High-Performance RISC-V Processor “Xiangshan” (Nanhu Version) Based on the Open-Source Matrix Instruction Set Extension (Vector Dot Product) | Xu-Hao Chen et.al. | 2409.00661 | null |
| 2024-11-10 | Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling | Guangya Wan et.al. | 2408.17017 | null |
| 2024-08-28 | Decentralized LLM Inference over Edge Networks with Energy Harvesting | Aria Khoshsirat et.al. | 2408.15907 | null |
| 2024-08-28 | Efficient LLM Scheduling by Learning to Rank | Yichao Fu et.al. | 2408.15792 | link |
| 2024-08-28 | Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation | Lujun Gui et.al. | 2408.15562 | null |
| 2024-08-23 | Memory-Efficient LLM Training with Online Subspace Descent | Kaizhao Liang et.al. | 2408.12857 | link |
| 2024-08-22 | NanoFlow: Towards Optimal Large Language Model Serving Throughput | Kan Zhu et.al. | 2408.12757 | link |
| 2024-10-23 | TensorOpera Router: A Multi-Model Router for Efficient LLM Inference | Dimitris Stripelis et.al. | 2408.12320 | null |
| 2024-09-04 | Parallel Speculative Decoding with Adaptive Draft Length | Tianyu Liu et.al. | 2408.11850 | link |
| 2024-08-21 | MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models | Elias Frantar et.al. | 2408.11743 | link |
| 2024-08-23 | Xinyu: An Efficient LLM-based System for Commentary Generation | Yiquan Wu et.al. | 2408.11609 | null |
| 2024-08-21 | Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning | Kai Xiong et.al. | 2408.11431 | null |
| 2024-08-21 | Image Score: Learning and Evaluating Human Preferences for Mercari Search | Chingis Oinar et.al. | 2408.11349 | null |
| 2024-08-20 | Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models | Artem Vazhentsev et.al. | 2408.10692 | null |
| 2024-08-20 | How Well Do Large Language Models Serve as End-to-End Secure Code Producers? | Jianian Gong et.al. | 2408.10495 | null |
| 2024-09-29 | GoNoGo: An Efficient LLM-based Multi-Agent System for Streamlining Automotive Software Release Decision-Making | Arsham Gholamzadeh Khoee et.al. | 2408.09785 | null |
| 2024-08-19 | PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars | Sumanth Prabhu et.al. | 2408.08869 | null |
| 2024-08-23 | ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models | Chao Zeng et.al. | 2408.08554 | link |
| 2024-08-14 | LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference | Seungjae Moon et.al. | 2408.07326 | null |
| 2024-08-12 | LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration | Zhiwen Mo et.al. | 2408.06003 | null |
| 2024-08-16 | Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion | Jacob K Christopher et.al. | 2408.05636 | null |
| 2024-08-10 | LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale | Jaehong Cho et.al. | 2408.05499 | link |
| 2024-08-05 | SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving | Andreas Kosmas Kakolyris et.al. | 2408.05235 | null |
| 2024-09-14 | Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness | Xiaojing Fan et.al. | 2408.04585 | null |
| 2024-08-08 | Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning | Ke Cheng et.al. | 2408.04323 | null |
| 2024-08-07 | Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference | Zeyu Zhang et.al. | 2408.04107 | null |
| 2024-08-07 | MPC-Minimized Secure LLM Inference | Deevashwer Rathee et.al. | 2408.03561 | null |
| 2024-08-06 | Can LLMs Serve As Time Series Anomaly Detectors? | Manqing Dong et.al. | 2408.03475 | null |
| 2024-08-05 | Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning | Hao Zhou et.al. | 2408.02549 | null |
| 2024-08-02 | The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines | Matias Martinez et.al. | 2408.01050 | null |
| 2024-08-01 | DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency | Jovan Stojkovic et.al. | 2408.00741 | null |
| 2024-08-01 | Designing Efficient LLM Accelerators for Edge Devices | Jude Haris et.al. | 2408.00462 | null |
| 2024-08-01 | Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control | Hao Zhou et.al. | 2408.00214 | null |
| 2024-09-10 | ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency | Yuhang Yao et.al. | 2408.00008 | null |
| 2024-08-01 | Responsive ML inference in multi-tenanted environments using AQUA | Abhishek Vijaya Kumar et.al. | 2407.21255 | null |
| 2024-11-04 | Palu: Compressing KV-Cache with Low-Rank Projection | Chi-Chih Chang et.al. | 2407.21118 | link |
| 2024-07-30 | Accelerating Large Language Model Inference with Self-Supervised Early Exits | Florian Valade et.al. | 2407.21082 | null |
| 2024-10-03 | ThinK: Thinner Key Cache by Query-Driven Pruning | Yuhui Xu et.al. | 2407.21018 | null |
| 2024-07-25 | An Efficient Inference Framework for Early-exit Large Language Models | Ruijie Miao et.al. | 2407.20272 | null |
| 2024-07-29 | Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost | Sania Nayab et.al. | 2407.19825 | null |
| 2024-07-29 | Teaching LLMs at Charles University: Assignments and Activities | Jindřich Helcl et.al. | 2407.19798 | null |
| 2024-07-09 | Mobile Edge Intelligence for Large Language Models: A Contemporary Survey | Guanqiao Qu et.al. | 2407.18921 | null |
| 2024-07-04 | The Price of Prompting: Profiling Energy Use in Large Language Models Inference | Erik Johannes Husom et.al. | 2407.16893 | link |
| 2024-07-23 | PhenoFlow: A Human-LLM Driven Visual Analytics System for Exploring Large and Complex Stroke Datasets | Jaeyoung Kim et.al. | 2407.16329 | null |
| 2024-07-22 | RazorAttention: Efficient KV Cache Compression Through Retrieval Heads | Hanlin Tang et.al. | 2407.15891 | null |
| 2024-07-22 | vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving | Jiale Xu et.al. | 2407.15309 | link |
| 2024-07-20 | All Against Some: Efficient Integration of Large Language Models for Message Passing in Graph Neural Networks | Ajay Jaiswal et.al. | 2407.14996 | null |
| 2024-07-19 | LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference | Qichen Fu et.al. | 2407.14057 | null |
| 2024-07-13 | Beyond KV Caching: Shared Attention for Efficient LLMs | Bingli Liao et.al. | 2407.12866 | link |
| 2025-04-01 | PQCache: Product Quantization-based KVCache for Long Context LLM Inference | Hailin Zhang et.al. | 2407.12820 | null |
| 2024-07-17 | Struct-X: Enhancing Large Language Models Reasoning with Structured Data | Xiaoyu Tan et.al. | 2407.12522 | null |
| 2024-07-17 | LLM Inference Serving: Survey of Recent Advances and Opportunities | Baolin Li et.al. | 2407.12391 | null |
| 2024-10-11 | Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale | Ayush Kaushal et.al. | 2407.12327 | link |
| 2024-11-16 | PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation | Branden Butler et.al. | 2407.11798 | null |
| 2024-08-16 | Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference | Yuan Feng et.al. | 2407.11550 | link |
| 2024-07-15 | Static Detection of Filesystem Vulnerabilities in Android Systems | Yu-Tsung Lee et.al. | 2407.11279 | null |
| 2024-10-03 | Fast Matrix Multiplications for Lookup Table-Quantized LLMs | Han Guo et.al. | 2407.10960 | link |
| 2024-10-02 | Optimized Multi-Token Joint Decoding with Auxiliary Model for LLM Inference | Zongyue Qin et.al. | 2407.09722 | null |
| 2024-08-30 | Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems | Amey Agrawal et.al. | 2407.07000 | link |
| 2024-07-08 | Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU | Daliang Xu et.al. | 2407.05858 | link |
| 2024-07-07 | A Queueing Theoretic Perspective on Low-Latency LLM Inference with Variable Token Length | Yuqing Yang et.al. | 2407.05347 | null |
| 2024-07-06 | Code Less, Align More: Efficient LLM Fine-tuning for Code Generation with Data Pruning | Yun-Da Tsai et.al. | 2407.05040 | null |
| 2024-11-16 | Software-Hardware Co-Design For Embodied AI Robots | Yiyang Huang et.al. | 2407.04292 | link |
| 2024-07-04 | Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems | Grant Wilkins et.al. | 2407.04014 | null |
| 2024-10-30 | MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention | Huiqiang Jiang et.al. | 2407.02490 | link |
| 2024-06-29 | When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration | Philipp Allgeuer et.al. | 2407.00518 | link |
| 2024-06-29 | Teola: Towards End-to-End Optimization of LLM-based Applications | Xin Tan et.al. | 2407.00326 | null |
| 2024-06-25 | T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge | Jianyu Wei et.al. | 2407.00088 | link |
| 2024-07-09 | Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving | Ruoyu Qin et.al. | 2407.00079 | link |
| 2024-06-28 | InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management | Wonbeom Lee et.al. | 2406.19707 | null |
| 2024-08-28 | AI-native Memory: A Pathway from LLMs Towards AGI | Jingbo Shang et.al. | 2406.18312 | null |
| 2024-06-25 | FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model | Feijie Wu et.al. | 2406.17706 | link |
| 2024-06-26 | MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool | Cunchen Hu et.al. | 2406.17565 | null |
| 2024-11-11 | Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters | Euiin Yi et.al. | 2406.16758 | link |
| 2025-05-16 | Demystifying AI Platform Design for Distributed Inference of Next-Generation LLM models | Abhimanyu Bambhaniya et.al. | 2406.01698 | null |
| 2025-05-02 | QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving | Yujun Lin et.al. | 2405.04532 | link |
| 2024-11-26 | Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | Haoran Qiu et.al. | 2404.08509 | null |
| 2024-05-31 | InferCept: Efficient Intercept Support for Augmented Large Language Model Inference | Reyna Abhyankar et.al. | 2402.01869 | null |
| 2023-12-08 | Efficient LLM Inference on CPUs | Haihao Shen et.al. | 2311.00502 | null |
| 2024-04-02 | SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification | Xupeng Miao et.al. | 2305.09781 | null |
LLM Scheduling
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2026-03-13 | SageSched: Efficient LLM Scheduling Confronting Demand Uncertainty and Hybridity | Zhenghao Gan et.al. | 2603.07917 | null |
| 2025-12-04 | Counting Without Running: Evaluating LLMs’ Reasoning About Code Complexity | Gregory Bolet et.al. | 2512.04355 | null |
| 2025-11-28 | LegalWebAgent: Empowering Access to Justice via LLM-Based Web Agents | Jinzhe Tan et.al. | 2512.04105 | null |
| 2025-12-03 | AugServe: Adaptive Request Scheduling for Augmented Large Language Model Inference Serving | Ying Wang et.al. | 2512.04013 | null |
| 2025-12-02 | PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing | Junyi Hou et.al. | 2512.02589 | null |
| 2025-12-01 | Trinity: Disaggregating Vector Search from Prefill-Decode Disaggregation in LLM Serving | Yi Liu et.al. | 2512.02281 | null |
| 2025-12-01 | RoMe: Row Granularity Access Memory System for Large Language Models | Hwayong Nam et.al. | 2512.01541 | null |
| 2025-12-01 | Tangram: Accelerating Serverless LLM Loading through GPU Memory Reuse and Affinity | Wenbin Zhu et.al. | 2512.01357 | null |
| 2025-12-01 | Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding | Yilong Zhao et.al. | 2512.01278 | null |
| 2025-11-30 | Neural Variable Name Repair: Learning to Rename Identifiers for Readability | Muhammad Yousuf et.al. | 2512.01141 | null |
| 2025-11-28 | OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning | Zixun Huang et.al. | 2511.23310 | null |
| 2025-11-28 | Beyond Curve Fitting: Neuro-Symbolic Agents for Context-Aware Epidemic Forecasting | Joongwon Chae et.al. | 2511.23276 | null |
| 2025-11-27 | OmniInfer: System-Wide Acceleration Techniques for Optimizing LLM Serving Throughput and Latency | Jun Wang et.al. | 2511.22481 | null |
| 2025-11-27 | FADiff: Fusion-Aware Differentiable Optimization for DNN Scheduling on Tensor Accelerators | Shuao Jia et.al. | 2511.22348 | null |
| 2025-11-27 | Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation | Zehao Deng et.al. | 2511.22235 | null |
| 2025-11-27 | Optimizing NetGPT via Routing-Based Synergy and Reinforcement Learning | Yuxuan Chen et.al. | 2511.22217 | null |
| 2025-11-26 | OOCO: Latency-disaggregated Architecture for Online-Offline Co-locate LLM Serving | Siyu Wu et.al. | 2511.21862 | null |
| 2025-12-01 | DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving | Fengze Yu et.al. | 2511.21669 | null |
| 2025-11-28 | DOPO: A Dynamic PD-Disaggregation Architecture for Maximizing Goodput in LLM Inference Serving | Junhan Liao et.al. | 2511.20982 | null |
| 2025-11-26 | Aragog: Just-in-Time Model Routing for Scalable Serving of Agentic Workflows | Yinwei Dai et.al. | 2511.20975 | null |
| 2025-11-25 | Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios | Luohe Shi et.al. | 2511.20340 | null |
| 2025-11-25 | Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design | Zixiao Huang et.al. | 2511.20048 | null |
| 2025-11-25 | HiCoGen: Hierarchical Compositional Text-to-Image Generation in Diffusion Models via Reinforcement Learning | Hongji Yang et.al. | 2511.19965 | null |
| 2025-11-24 | Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution | Dingkang Liang et.al. | 2511.19430 | null |
| 2025-11-24 | How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining | Kairong Luo et.al. | 2511.18903 | null |
| 2025-11-24 | Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference | Wengyi Zhan et.al. | 2511.18875 | null |
| 2025-11-23 | Optimal Meal Schedule for a Local Nonprofit Using LLM-Aided Data Extraction | Sergio Marin et.al. | 2511.18483 | null |
| 2025-11-28 | Progressive Localisation in Localist LLMs | Joachim Diederich et.al. | 2511.18375 | null |
| 2025-11-23 | Hybrid Agentic AI and Multi-Agent Systems in Smart Manufacturing | Mojtaba A. Farahani et.al. | 2511.18258 | null |
| 2025-11-22 | Towards a General Framework for HTN Modeling with LLMs | Israel Puerta-Merino et.al. | 2511.18165 | null |
| 2025-11-20 | LLM4EO: Large Language Model for Evolutionary Optimization in Flexible Job Shop Scheduling | Rongjie Liao et.al. | 2511.16485 | null |
| 2025-11-20 | Operon: Incremental Construction of Ragged Data via Named Dimensions | Sungbin Moon et.al. | 2511.16080 | null |
| 2025-11-19 | MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping | Yushi Huang et.al. | 2511.15690 | null |
| 2025-11-18 | Near-Lossless Model Compression Enables Longer Context Inference in DNA Large Language Models | Rui Zhu et.al. | 2511.14694 | null |
| 2025-11-23 | Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning | Ruoyu Qin et.al. | 2511.14617 | null |
| 2025-11-18 | Hyperion: Hierarchical Scheduling for Parallel LLM Acceleration in Multi-tier Networks | Mulei Ma et.al. | 2511.14450 | null |
| 2025-11-17 | The Good, The Bad, and The Hybrid: A Reward Structure Showdown in Reasoning Models Training | Subramanyam Sahoo et.al. | 2511.13016 | null |
| 2025-11-17 | ENGRAM: Effective, Lightweight Memory Orchestration for Conversational Agents | Daivik Patel et.al. | 2511.12960 | null |
| 2025-11-17 | CoS: Towards Optimal Event Scheduling via Chain-of-Scheduling | Yiming Zhao et.al. | 2511.12913 | null |
| 2025-11-19 | Harli: SLO-Aware Co-location of LLM Inference and PEFT-based Finetuning on Model-as-a-Service Platforms | Ao Xu et.al. | 2511.11729 | null |
| 2025-11-05 | AnchorTP: Resilient LLM Inference with State-Preserving Elastic Tensor Parallelism | Wendong Xu et.al. | 2511.11617 | null |
| 2025-11-13 | EEGAgent: A Unified Framework for Automated EEG Analysis Using Large Language Models | Sha Zhao et.al. | 2511.09947 | null |
| 2025-11-12 | AdaCuRL: Adaptive Curriculum Reinforcement Learning with Invalid Sample Mitigation and Historical Revisiting | Renda Li et.al. | 2511.09478 | null |
| 2025-11-12 | POTSA: A Cross-Lingual Speech Alignment Framework for Low Resource Speech-to-Text Translation | Xuanchen Li et.al. | 2511.09232 | null |
| 2025-11-12 | FLAD: Federated Learning for LLM-based Autonomous Driving in Vehicle-Edge-Cloud Networks | Tianao Xiang et.al. | 2511.09025 | null |
| 2025-11-07 | Motif 2 12.7B technical report | Junghwan Lim et.al. | 2511.07464 | null |
| 2025-11-10 | LLMServingSim2.0: A Unified Simulator for Heterogeneous Hardware and Serving Techniques in LLM Infrastructure | Jaehong Cho et.al. | 2511.07229 | null |
| 2025-11-10 | Can LLM Annotations Replace User Clicks for Learning to Rank? | Lulu Yu et.al. | 2511.06635 | null |
| 2025-11-09 | AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving | Ruifei Zhang et.al. | 2511.06253 | null |
| 2025-11-08 | CoEdge-RAG: Optimizing Hierarchical Scheduling for Retrieval-Augmented LLMs in Collaborative Edge Computing | Guihang Hong et.al. | 2511.05915 | null |
| 2025-11-09 | Optimal Inference Schedules for Masked Diffusion Models | Sitan Chen et.al. | 2511.04647 | null |
| 2025-11-06 | PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration | Yue Jiet Chong et.al. | 2511.04036 | null |
| 2025-11-05 | ALAS: Transactional and Dynamic Multi-Agent LLM Planning | Longling Geng et.al. | 2511.03094 | null |
| 2025-11-04 | LiveSecBench: A Dynamic and Culturally-Relevant AI Safety Benchmark for LLMs in Chinese Context | Yudong Li et.al. | 2511.02366 | null |
| 2025-11-04 | An LLM-powered MILP modelling engine for workforce scheduling guided by expert knowledge | Qingyang Li et.al. | 2511.02364 | null |
| 2025-11-04 | Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live | Hanchen Li et.al. | 2511.02230 | null |
| 2025-11-04 | Optimal-Agent-Selection: State-Aware Routing Framework for Efficient Multi-Agent Collaboration | Jingbo Wang et.al. | 2511.02200 | null |
| 2025-11-03 | TPS-Bench: Evaluating AI Agents’ Tool Planning \& Scheduling Abilities in Compounding Tasks | Hanwen Xu et.al. | 2511.01527 | null |
| 2025-11-03 | Modular Task Decomposition and Dynamic Collaboration in Multi-Agent Systems Driven by Large Language Models | Shuaidong Pan et.al. | 2511.01149 | null |
| 2025-11-05 | FREESH: Fair, Resource- and Energy-Efficient Scheduling for LLM Serving on Heterogeneous GPUs | Xuan He et.al. | 2511.00807 | null |
| 2025-11-02 | AReaL-Hex: Accommodating Asynchronous RL Training over Heterogeneous GPUs | Ran Yan et.al. | 2511.00796 | null |
| 2025-10-19 | Justitia: Fair and Efficient Scheduling for LLM Applications | Mingyan Yang et.al. | 2510.17015 | null |
| 2025-10-08 | OptPipe: Memory- and Scheduling-Optimized Pipeline Parallelism for LLM Training | Hongpei Li et.al. | 2510.05186 | null |
| 2025-08-14 | Block: Balancing Load in LLM Serving with Context, Knowledge and Predictive Scheduling | Wei Da et.al. | 2508.03611 | null |
| 2025-08-05 | Optimal Scheduling Algorithms for LLM Inference: Theory and Practice | Agrim Bari et.al. | 2508.01002 | null |
| 2025-09-16 | InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching | Yilun Wang et.al. | 2507.08523 | null |
| 2025-07-09 | Gradientsys: A Multi-Agent LLM Scheduler with ReAct Orchestration | Xinyuan Song et.al. | 2507.06520 | null |
| 2025-06-17 | Semantic Scheduling for LLM Inference | Wenyue Hua et.al. | 2506.12204 | null |
| 2025-05-29 | Sustainable Carbon-Aware and Water-Efficient LLM Scheduling in Geo-Distributed Cloud Datacenters | Hayden Moore et.al. | 2505.23554 | null |
| 2025-05-26 | Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency | Ruixiao Li et.al. | 2505.17074 | null |
| 2025-05-14 | ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor | Seungbeom Choi et.al. | 2505.09142 | null |
| 2025-04-25 | Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents | Yueying Li et.al. | 2504.07347 | null |
| 2025-04-08 | LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications | Botao Zhu et.al. | 2504.03444 | null |
| 2025-07-25 | How do language models learn facts? Dynamics, curricula and hallucinations | Nicolas Zucchet et.al. | 2503.21676 | null |
| 2025-05-21 | Online Scheduling for LLM Inference with KV Cache Constraints | Patrick Jaillet et.al. | 2502.07115 | null |
| 2025-11-06 | LLM Query Scheduling with Prefix Reuse and Latency Constraints | Gregory Dexter et.al. | 2502.04677 | null |
| 2024-11-01 | ALISE: Accelerating Large Language Model Serving with Speculative Scheduling | Youpeng Zhao et.al. | 2410.23537 | null |
| 2025-06-08 | PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference | Zeyu Zhang et.al. | 2409.15104 | null |
| 2024-08-28 | Efficient LLM Scheduling by Learning to Rank | Yichao Fu et.al. | 2408.15792 | null |
| 2024-11-15 | Large Language Models for Power Scheduling: A User-Centric Approach | Thomas Mongaillard et.al. | 2407.00476 | null |
| 2024-06-07 | Llumnix: Dynamic Scheduling for Large Language Model Serving | Biao Sun et.al. | 2406.03243 | null |
| 2024-05-24 | PerLLM: Personalized Inference Scheduling with Edge-Cloud Collaboration for Diverse LLM Services | Zheming Yang et.al. | 2405.14636 | null |
| 2024-05-14 | Automated Conversion of Static to Dynamic Scheduler via Natural Language | Paul Mingzheng Tang et.al. | 2405.06697 | null |
| 2024-08-06 | On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS) | Vishal Pallagani et.al. | 2401.02500 | null |
| 2023-05-30 | Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline | Zangwei Zheng et.al. | 2305.13144 | null |
MoE
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2026-04-02 | The Expert Strikes Back: Interpreting Mixture-of-Experts Language Models at Expert Level | Jeremy Herbst et.al. | 2604.02178 | null |
| 2026-04-02 | FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Large Attention-Based Model Inference on Tile-Based Accelerators | Chi Zhang et.al. | 2604.02110 | null |
| 2026-04-02 | SURE: Synergistic Uncertainty-aware Reasoning for Multimodal Emotion Recognition in Conversations | Yiqiang Cai et.al. | 2604.01916 | null |
| 2026-04-02 | FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models | Juyong Jiang et.al. | 2604.01762 | null |
| 2026-04-02 | M3D-BFS: a Multi-stage Dynamic Fusion Strategy for Sample-Adaptive Multi-Modal Brain Network Analysis | Rui Dong et.al. | 2604.01667 | null |
| 2026-04-02 | Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models | Shuibai Zhang et.al. | 2604.01622 | null |
| 2026-04-02 | DWDP: Distributed Weight Data Parallelism for High-Performance LLM Inference on NVL72 | Wanqian Li et.al. | 2604.01621 | null |
| 2026-04-01 | Learning When to See and When to Feel: Adaptive Vision-Torque Fusion for Contact-Aware Manipulation | Jiuzhou Lei et.al. | 2604.01414 | null |
| 2026-04-01 | Sparse Spectral LoRA: Routed Experts for Medical VLMs | Omid Nejati Manzari et.al. | 2604.01310 | null |
| 2026-04-01 | Brainstacks: Cross-Domain Cognitive Capabilities via Frozen MoE-LoRA Stacks for Continual LLM Learning | Mohammad R. Abu Ayyash et.al. | 2604.01152 | null |
| 2026-04-02 | Asymptotically Optimal Sequential Testing with Heterogeneous LLMs | Guokai Li et.al. | 2604.01086 | null |
| 2026-04-01 | PHASOR: Anatomy- and Phase-Consistent Volumetric Diffusion for CT Virtual Contrast Enhancement | Zilong Li et.al. | 2604.01053 | null |
| 2026-04-01 | KUET at StanceNakba Shared Task: StanceMoE: Mixture-of-Experts Architecture for Stance Detection | Abdullah Al Shafi et.al. | 2604.00878 | null |
| 2026-04-01 | Cost-Penalized Fitness in FMA-Orchestrated Mixture of Experts: Experimental Evidence for Molecular Memory in Domain Adaptation | Martin Jaraiz et.al. | 2604.00812 | null |
| 2026-04-01 | Routing-Free Mixture-of-Experts | Yilun Liu et.al. | 2604.00801 | null |
| 2026-04-01 | Scalable Pretraining of Large Mixture of Experts Language Models on Aurora Super Computer | Dharma Teja Vooturi et.al. | 2604.00785 | null |
| 2026-04-01 | Toward Optimal Sampling Rate Selection and Unbiased Classification for Precise Animal Activity Recognition | Axiu Mao et.al. | 2604.00517 | null |
| 2026-04-01 | Self-Routing: Parameter-Free Expert Routing from Hidden States | Jama Hussein Mohamud et.al. | 2604.00421 | null |
| 2026-03-31 | From Skew to Symmetry: Node-Interconnect Multi-Path Balancing with Execution-time Planning for Modern GPU Clusters | Jinghan Yao et.al. | 2604.00317 | null |
| 2026-03-31 | Directly visualizing the energy level structure of quantum dot molecules | Heun Mo Yoo et.al. | 2604.00232 | null |
| 2026-03-31 | Towards Verifiable and Self-Correcting AI Physicists for Quantum Many-Body Simulations | Ken Deng et.al. | 2604.00149 | null |
| 2026-03-31 | PASM: Population Adaptive Symbolic Mixture-of-Experts Model for Cross-location Hurricane Evacuation Decision Prediction | Xiao Qian et.al. | 2604.00074 | null |
| 2026-03-31 | Short proofs in combinatorics and number theory | Boris Alexeev et.al. | 2603.29961 | null |
| 2026-03-31 | First energy scan measurement of $e^{+}e^{-}\to K^{+}K^{-}$ around the $ψ(2S)$ resonance | BESIII Collaboration et.al. | 2603.29854 | null |
| 2026-03-31 | Counterfactual Analysis of Brain Network Dynamics | Moo K. Chung et.al. | 2603.29843 | null |
| 2026-03-31 | Training-Free Dynamic Upcycling of Expert Language Models | Eros Fanì et.al. | 2603.29765 | null |
| 2026-03-31 | TrafficMoE: Heterogeneity-aware Mixture of Experts for Encrypted Traffic Classification | Qing He et.al. | 2603.29520 | null |
| 2026-03-31 | Aligning Multimodal Sequential Recommendations via Robust Direct Preference Optimization with Sparse MoE | Hejin Huang et.al. | 2603.29259 | null |
| 2026-03-31 | Route-Induced Density and Stability (RIDE): Controlled Intervention and Mechanism Analysis of Routing-Style Meta Prompts on LLM Internal States | Dianxing Zhang et.al. | 2603.29206 | null |
| 2026-03-31 | BiMoE: Brain-Inspired Experts for EEG-Dominant Affective State Recognition | Hongyu Zhu et.al. | 2603.29205 | null |
| 2026-03-30 | Rethinking Language Model Scaling under Transferable Hypersphere Optimization | Liliang Ren et.al. | 2603.28743 | null |
| 2026-03-30 | StreamingVLA: Streaming Vision-Language-Action Model with Action Flow Matching and Adaptive Early Observation | Yiran Shi et.al. | 2603.28565 | null |
| 2026-03-30 | Observation of $Λ^+_c\to nπ^+η$ and search for $Λ^+_c\to na_0(980)^+$ | BESIII Collaboration et.al. | 2603.28232 | null |
| 2026-03-30 | Graph Vector Field: A Unified Framework for Multimodal Health Risk Assessment from Heterogeneous Wearable and Environmental Data Streams | Silvano Coletti et.al. | 2603.28115 | null |
| 2026-03-30 | ExFusion: Efficient Transformer Training via Multi-Experts Fusion | Jiacheng Ruan et.al. | 2603.27965 | null |
| 2026-03-31 | MathGen: Revealing the Illusion of Mathematical Competence through Text-to-Image Generation | Ruiyao Liu et.al. | 2603.27959 | null |
| 2026-03-29 | KAT-Coder-V2 Technical Report | Fengxiang Li et.al. | 2603.27703 | null |
| 2026-03-29 | LVRPO: Language-Visual Alignment with GRPO for Multimodal Understanding and Generation | Shentong Mo et.al. | 2603.27693 | null |
| 2026-03-29 | PRBench: End-to-end Paper Reproduction in Physics Research | Shi Qiu et.al. | 2603.27646 | null |
| 2026-03-29 | Expert Streaming: Accelerating Low-Batch MoE Inference via Multi-chiplet Architecture and Dynamic Expert Trajectory Scheduling | Songchen Ma et.al. | 2603.27624 | null |
| 2026-03-29 | Fully Spiking Neural Networks with Target Awareness for Energy-Efficient UAV Tracking | Pengzhi Zhong et.al. | 2603.27493 | null |
| 2026-03-29 | On Token’s Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models | Chongyang Zhao et.al. | 2603.27481 | null |
| 2026-03-28 | Unveiling Code Clones in the Eclipse IIoT Software Ecosystem | Zengyang Li et.al. | 2603.27308 | null |
| 2026-03-28 | Persistent Memory Through Triple-Loop Consolidation in a Non-Gradient Dissipative Cognitive Architecture | Jianwei Lou et.al. | 2603.27188 | null |
| 2026-03-28 | Routing Sensitivity Without Controllability: A Diagnostic Study of Fairness in MoE Language Models | Junhyeok Lee et.al. | 2603.27141 | null |
| 2026-03-27 | TAPS: Task Aware Proposal Distributions for Speculative Sampling | Mohamad Zbib et.al. | 2603.27027 | null |
| 2026-03-27 | Learning to Commit: Generating Organic Pull Requests via Online Repository Memory | Mo Li et.al. | 2603.26664 | null |
| 2026-03-27 | Sustainability Is Not Linear: Quantifying Performance, Energy, and Privacy Trade-offs in On-Device Intelligence | Eziyo Ehsani et.al. | 2603.26603 | null |
| 2026-03-26 | Can Small Models Reason About Legal Documents? A Comparative Study | Snehit Vaddi et.al. | 2603.25944 | null |
| 2026-03-26 | Narrowband searches for continuous gravitational waves from known pulsars in the first two parts of the fourth LIGO–Virgo–KAGRA observing run | The LIGO Scientific Collaboration et.al. | 2603.25938 | null |
| 2026-03-26 | AD-CARE: A Guideline-grounded, Modality-agnostic LLM Agent for Real-world Alzheimer’s Disease Diagnosis with Multi-cohort Assessment, Fairness Analysis, and Reader Study | Wenlong Hou et.al. | 2603.25322 | null |
| 2026-03-26 | SliderQuant: Accurate Post-Training Quantization for LLMs | Shigeng Wang et.al. | 2603.25284 | null |
| 2026-03-26 | A Wireless World Model for AI-Native 6G Networks | Ziqi Chen et.al. | 2603.25216 | null |
| 2026-03-26 | MCLMR: A Model-Agnostic Causal Learning Framework for Multi-Behavior Recommendation | Ranxu Zhang et.al. | 2603.25126 | null |
| 2026-03-26 | MP-MoE: Matrix Profile-Guided Mixture of Experts for Precipitation Forecasting | Huyen Ngoc Tran et.al. | 2603.25046 | null |
| 2026-03-26 | MoE-GRPO: Optimizing Mixture-of-Experts via Reinforcement Learning in Vision-Language Models | Dohwan Ko et.al. | 2603.24984 | null |
| 2026-03-26 | CROSS: A Mixture-of-Experts Reinforcement Learning Framework for Generalizable Large-Scale Traffic Signal Control | Xibei Chen et.al. | 2603.24930 | null |
| 2026-03-25 | OptiSAR-Net++: A Large-Scale Benchmark and Transformer-Free Framework for Cross-Domain Remote Sensing Visual Grounding | Xiaoyu Tang et.al. | 2603.24876 | null |
| 2026-03-25 | Enes Causal Discovery | Alexis Kafantaris et.al. | 2603.24436 | null |
| 2026-03-25 | Cross Section Measurements of $\bar{n}p \rightarrow K^{+}K^{-}π^{+}(π^{0})$ via Antineutrons Produced by $J/ψ\to p π^{-} \bar{n}$ Decays | BESIII Collaboration et.al. | 2603.24272 | null |
| 2026-03-25 | B-MoE: A Body-Part-Aware Mixture-of-Experts “All Parts Matter” Approach to Micro-Action Recognition | Nishit Poddar et.al. | 2603.24245 | null |
| 2026-03-25 | Sequence-aware Large Language Models for Explainable Recommendation | Gangyi Zhang et.al. | 2603.24136 | null |
| 2026-03-25 | PCHC: Enabling Preference Conditioned Humanoid Control via Multi-Objective Reinforcement Learning | Huanyu Li et.al. | 2603.24047 | null |
| 2026-03-25 | LGEST: Dynamic Spatial-Spectral Expert Routing for Hyperspectral Image Classification | Jiawen Wen et.al. | 2603.24045 | null |
| 2026-03-25 | MoE-Sieve: Routing-Guided LoRA for Efficient MoE Fine-Tuning | Andrea Manzoni et.al. | 2603.24044 | null |
| 2026-03-25 | SiftMoE: Similarity-Aware Energy-Efficient Expert Selection for Wireless Distributed MoE Inference | Qian Chen et.al. | 2603.23888 | null |
| 2026-03-24 | Lightweight Fairness for LLM-Based Recommendations via Kernelized Projection and Gated Adapters | Nan Cui et.al. | 2603.23780 | null |
| 2026-03-24 | The Diminishing Returns of Early-Exit Decoding in Modern LLMs | Rui Wei et.al. | 2603.23701 | null |
| 2026-03-24 | VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs | Haoran Yuan et.al. | 2603.23481 | link |
| 2026-03-24 | Similarity-Aware Mixture-of-Experts for Data-Efficient Continual Learning | Connor Mclaughlin et.al. | 2603.23436 | null |
| 2026-03-24 | Amplitude Analysis of the Isospin-Violating Decay $J/ψ\rightarrowγηπ^{0}$ | BESIII Collaboration et.al. | 2603.23081 | null |
| 2026-03-24 | IntentWeave: A Progressive Entry Ladder for Multi-Surface Browser Agents in Cloud Portals | Wanying Mo et.al. | 2603.22917 | null |
| 2026-03-24 | Search for the radiative decays $D^0\to γ\bar K_1(1270)^0$ and $D^+\to γK_1(1270)^+$ | BESIII Collaboration et.al. | 2603.22804 | null |
| 2026-03-24 | KALAVAI: Predicting When Independent Specialist Fusion Works – A Quantitative Model for Post-Hoc Cooperative LLM Training | Ramchand Kumaresan et.al. | 2603.22755 | null |
| 2026-03-24 | Why Database Manuals Are Not Enough: Efficient and Reliable Configuration Tuning for DBMSs via Code-Driven LLM Agents | Xinyi Zhang et.al. | 2603.22708 | null |
| 2026-03-23 | Bridging the Know-Act Gap via Task-Level Autoregressive Reasoning | Jihyun Janice Ahn et.al. | 2603.22619 | null |
| 2026-03-23 | FullCircle: Effortless 3D Reconstruction from Casual 360 $^\circ$ Captures | Yalda Foroutan et.al. | 2603.22572 | null |
| 2026-03-23 | 3D-Layout-R1: Structured Reasoning for Language-Instructed Spatial Editing | Haoyu Zhen et.al. | 2603.22279 | null |
| 2026-03-23 | A bending in the size-mass relation of star-forming galaxies across $0.5 < z < 6.0$ at a critical stellar mass of $10^{10}M_\odot$ revealed by JWST | Longyue Chen et.al. | 2603.22239 | null |
| 2026-03-23 | Mixture of Mini Experts: Overcoming the Linear Layer Bottleneck in Multiple Instance Learning | Daniel Shao et.al. | 2603.22198 | null |
| 2026-03-23 | ADaFuSE: Adaptive Diffusion-generated Image and Text Fusion for Interactive Text-to-Image Retrieval | Zhuocheng Zhang et.al. | 2603.21886 | null |
| 2026-03-23 | Holistic Scaling Laws for Optimal Mixture-of-Experts Architecture Optimization | Weilin Wan et.al. | 2603.21862 | null |
| 2026-03-23 | DiT-Flow: Speech Enhancement Robust to Multiple Distortions based on Flow Matching in Latent Space and Diffusion Transformers | Tianyu Cao et.al. | 2603.21608 | null |
| 2026-03-22 | Aggregation Alignment for Federated Learning with Mixture-of-Experts under Data Heterogeneity | Zihan Fang et.al. | 2603.21276 | null |
| 2026-03-22 | QMoP: Query Guided Mixture-of-Projector for Efficient Visual Token Compression | Zhongyang Li et.al. | 2603.21232 | null |
| 2026-03-22 | MI-DPG: Decomposable Parameter Generation Network Based on Mutual Information for Multi-Scenario Recommendation | Wenzhuo Cheng et.al. | 2603.21209 | null |
| 2026-03-22 | Diffusion-based Probabilistic Air Quality Forecasting with Mechanistic Insight | Ao Ding et.al. | 2603.21131 | null |
| 2026-03-22 | Mixture of Chapters: Scaling Learnt Memory in Transformers | Tasmay Pankaj Tibrewal et.al. | 2603.21096 | null |
| 2026-03-22 | CoVFT: Context-aware Visual Fine-tuning for Multimodal Large Language Models | Nan Zhou et.al. | 2603.21077 | null |
| 2026-03-22 | LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning | Jianing Wang et.al. | 2603.21065 | null |
| 2026-03-21 | Satellite-to-Street: Synthesizing Post-Disaster Views from Satellite Imagery via Generative Vision Models | Yifan Yang et.al. | 2603.20697 | null |
| 2026-03-21 | CFNN: Continued Fraction Neural Network | Chao Wang et.al. | 2603.20634 | null |
| 2026-03-21 | A 4R-supported circular product-service system for luxury branded events | Ke Ma et.al. | 2603.20613 | null |
| 2026-03-20 | AE-LLM: Adaptive Efficiency Optimization for Large Language Models | Kaito Tanaka et.al. | 2603.20492 | null |
| 2026-03-20 | Thinking in Different Spaces: Domain-Specific Latent Geometry Survives Cross-Architecture Translation | Marcus Armstrong et.al. | 2603.20406 | null |
| 2026-03-20 | Gesture2Speech: How Far Can Hand Movements Shape Expressive Speech? | Lokesh Kumar et.al. | 2603.19831 | null |
| 2026-03-20 | Making Video Models Adhere to User Intent with Minor Adjustments | Daniel Ajisafe et.al. | 2603.19672 | null |
| 2026-03-20 | Structured Prompting for Arabic Essay Proficiency: A Trait-Centric Evaluation Approach | Salim Al Mandhari et.al. | 2603.19668 | null |
| 2026-03-20 | CS-MUNet: A Channel-Spatial Dual-Stream Mamba Network for Multi-Organ Segmentation | Yuyang Zheng et.al. | 2603.19659 | null |
| 2026-03-20 | UniBioTransfer: A Unified Framework for Multiple Biometrics Transfer | Caiyi Sun et.al. | 2603.19637 | null |
| 2026-03-19 | Scalable Prompt Routing via Fine-Grained Latent Task Discovery | Yunyi Zhang et.al. | 2603.19415 | null |
| 2026-03-22 | Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation | Zhuolin Yang et.al. | 2603.19220 | null |
| 2026-03-19 | DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Efficient MoE Inference on Edge | Yuegui Huang et.al. | 2603.19172 | null |
| 2026-03-19 | ATG-MoE: Autoregressive trajectory generation with mixture-of-experts for assembly skill learning | Weihang Huang et.al. | 2603.19029 | null |
| 2026-03-19 | GWTC-4.0: Tests of General Relativity. III. Tests of the Remnants | The LIGO Scientific Collaboration et.al. | 2603.19021 | null |
| 2026-03-19 | GWTC-4.0: Tests of General Relativity. II. Parameterized Tests | The LIGO Scientific Collaboration et.al. | 2603.19020 | null |
| 2026-03-19 | GWTC-4.0: Tests of General Relativity. I. Overview and General Tests | The LIGO Scientific Collaboration et.al. | 2603.19019 | null |
| 2026-03-19 | DriftGuard: Mitigating Asynchronous Data Drift in Federated Learning | Yizhou Han et.al. | 2603.18872 | null |
| 2026-03-19 | Empathetic Motion Generation for Humanoid Educational Robots via Reasoning-Guided Vision–Language–Motion Diffusion Architecture | Fuze Sun et.al. | 2603.18771 | null |
| 2026-03-19 | Observation of $D_s^+ \to a_0(980)^+f_0(500)$ in the Amplitude Analysis of $D_s^+ \to π^+ π^0 π^0 η$ | BESIII Collaboration et.al. | 2603.18521 | null |
| 2026-03-19 | AIMER: Calibration-Free Task-Agnostic MoE Pruning | Zongfang Liu et.al. | 2603.18492 | null |
| 2026-03-19 | AlignMamba-2: Enhancing Multimodal Fusion and Sentiment Analysis with Modality-Aware Mamba | Yan Li et.al. | 2603.18462 | null |
| 2026-03-19 | Spatially Indirect Exciton Condensation in Two-Dimensional Strongly Correlated Semimetals | Yao Zeng et.al. | 2603.18445 | null |
| 2026-03-18 | Path-Constrained Mixture-of-Experts | Zijin Gu et.al. | 2603.18297 | null |
| 2026-03-18 | CORE: Robust Out-of-Distribution Detection via Confidence and Orthogonal Residual Scoring | Jin Mo Yang et.al. | 2603.18290 | null |
| 2026-03-18 | Resonance-enhanced integrated acousto-optic beam steering | Yue Yu et.al. | 2603.18191 | null |
| 2026-03-18 | Understanding Task Aggregation for Generalizable Ultrasound Foundation Models | Fangyijie Wang et.al. | 2603.18123 | null |
| 2026-03-18 | DebugLM: Learning Traceable Training Data Provenance for LLMs | Wenjie Jacky Mo et.al. | 2603.17884 | null |
| 2026-03-18 | The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency | Huamin Chen et.al. | 2603.17280 | null |
| 2026-03-17 | Knowledge Localization in Mixture-of-Experts LLMs Using Cross-Lingual Inconsistency | Lucas Bandarkar et.al. | 2603.17102 | null |
| 2026-03-17 | Edge-Efficient Two-Stream Multimodal Architecture for Non-Intrusive Bathroom Fall Detection | Haitian Wang et.al. | 2603.17069 | null |
| 2026-03-17 | SpecMoE: Spectral Mixture-of-Experts Foundation Model for Cross-Species EEG Decoding | D. Darankoum et.al. | 2603.16739 | null |
| 2026-03-17 | HMAR: Hierarchical Modality-Aware Expert and Dynamic Routing Medical Image Retrieval Architecture | Aojie Yuan et.al. | 2603.16679 | null |
| 2026-03-19 | Mixture of Style Experts for Diverse Image Stylization | Shihao Zhu et.al. | 2603.16649 | null |
| 2026-03-17 | Tarab: A Multi-Dialect Corpus of Arabic Lyrics and Poetry | Mo El-Haj et.al. | 2603.16601 | null |
| 2026-03-17 | Visual Distraction Undermines Moral Reasoning in Vision-Language Models | Xinyi Yang et.al. | 2603.16445 | null |
| 2026-03-18 | EngGPT2: Sovereign, Efficient and Open Intelligence | G. Ciarfaglia et.al. | 2603.16430 | null |
| 2026-03-17 | PlotTwist: A Creative Plot Generation Framework with Small Language Models | Abhinav Thorat et.al. | 2603.16410 | null |
| 2026-03-17 | DynamicGate MLP Conditional Computation via Learned Structural Dropout and Input Dependent Gating for Functional Plasticity | Yong Il Choi et.al. | 2603.16367 | null |
| 2026-03-17 | Behavioral Steering in a 35B MoE Language Model via SAE-Decoded Probe Vectors: One Agency Axis, Not Five Traits | Jia Qing Yap et.al. | 2603.16335 | null |
| 2026-03-17 | AW-MoE: All-Weather Mixture of Experts for Robust Multi-Modal 3D Object Detection | Hongwei Lin et.al. | 2603.16261 | null |
| 2026-03-17 | Accelerating Approximate Analytical Join Queries over Unstructured Data with Statistical Guarantees | Yuxuan Zhu et.al. | 2603.16153 | null |
| 2026-03-16 | Confidently Wrong: Why Ignoring Binaries Biases IMF Inference at Large Sample Sizes | Anna L. Rosen et.al. | 2603.15779 | null |
| 2026-03-16 | Mastering the Minority: An Uncertainty-guided Multi-Expert Framework for Challenging-tailed Sequence Learning | Ye Wang et.al. | 2603.15708 | null |
| 2026-03-16 | Bridging Local and Global Knowledge: Cascaded Mixture-of-Experts Learning for Near-Shortest Path Routing | Yung-Fu Chen et.al. | 2603.15541 | null |
| 2026-03-16 | Talk, Evaluate, Diagnose: User-aware Agent Evaluation with Automated Error Analysis | Penny Chong et.al. | 2603.15483 | null |
| 2026-03-16 | A Closer Look into LLMs for Table Understanding | Jia Wang et.al. | 2603.15402 | null |
| 2026-03-16 | MoE-ACT: Scaling Multi-Task Bimanual Manipulation with Sparse Language-Conditioned Mixture-of-Experts Transformers | Kangjun Guo et.al. | 2603.15265 | null |
| 2026-03-17 | Tracking the Discriminative Axis: Dual Prototypes for Test-Time OOD Detection Under Covariate Shift | Wooseok Lee et.al. | 2603.15213 | null |
| 2026-03-16 | ForceVLA2: Unleashing Hybrid Force-Position Control with Force Awareness for Contact-Rich Manipulation | Yang Li et.al. | 2603.15169 | null |
| 2026-03-16 | M2IR: Proactive All-in-One Image Restoration via Mamba-style Modulation and Mixture-of-Experts | Shiwei Wang et.al. | 2603.14816 | null |
| 2026-03-16 | Genetic Algorithms in Regression | Mo Li et.al. | 2603.14801 | null |
| 2026-03-16 | Universe Routing: Why Self-Evolving Agents Need Epistemic Control | Zhaohui Geoffrey Wang et.al. | 2603.14799 | null |
| 2026-03-15 | TopoCL: Topological Contrastive Learning for Medical Imaging | Guangyu Meng et.al. | 2603.14647 | null |
| 2026-03-15 | A measurement of gas rotation in galaxy groups via the kinetic Sunyaev-Zeldovich effect | Tianyi Yang et.al. | 2603.14494 | null |
| 2026-03-15 | Towards One-for-All Anomaly Detection for Tabular Data | Shiyuan Li et.al. | 2603.14407 | null |
| 2026-03-15 | WestWorld: A Knowledge-Encoded Scalable Trajectory World Model for Diverse Robotic Systems | Yuchen Wang et.al. | 2603.14392 | null |
| 2026-03-15 | M $^2$ RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling | Mayank Mishra et.al. | 2603.14360 | null |
| 2026-03-15 | A Physically-Grounded Attack and Adaptive Defense Framework for Real-World Low-Light Image Enhancement | Tongshun Zhang et.al. | 2603.14304 | null |
| 2026-03-15 | All-sky Searches for Continuous Gravitational Waves from Isolated Neutron Stars in the Data from the First Part of the Fourth LIGO-Virgo-KAGRA Observing Run | The LIGO Scientific Collaboration et.al. | 2603.14168 | null |
| 2026-03-14 | PA-Net: Precipitation-Adaptive Mixture-of-Experts for Long-Tail Rainfall Nowcasting | Xinyu Xiao et.al. | 2603.13818 | null |
| 2026-03-14 | Implicit Maximum Likelihood Estimation for Real-time Generative Model Predictive Control | Grayson Lee et.al. | 2603.13733 | null |
| 2026-03-14 | Sparse-Dense Mixture of Experts Adapter for Multi-Modal Tracking | Yabin Zhu et.al. | 2603.13719 | null |
| 2026-03-13 | NCCL EP: Towards a Unified Expert Parallel Communication API for NCCL | Amos Goldman et.al. | 2603.13606 | null |
| 2026-03-13 | MoEKD: Mixture-of-Experts Knowledge Distillation for Robust and High-Performing Compressed Code Models | Md. Abdul Awal et.al. | 2603.13213 | null |
| 2026-03-13 | Reference-Free Image Quality Assessment for Virtual Try-On via Human Feedback | Yuki Hirakawa et.al. | 2603.13057 | null |
| 2026-03-13 | Team RAS in 10th ABAW Competition: Multimodal Valence and Arousal Estimation Approach | Elena Ryumina et.al. | 2603.13056 | null |
| 2026-03-13 | Multimodal Protein Language Models for Enzyme Kinetic Parameters: From Substrate Recognition to Conformational Adaptation | Fei Wang et.al. | 2603.12845 | null |
| 2026-03-13 | Serving Hybrid LLM Loads with SLO Guarantees Using CPU-GPU Attention Piggybacking | Zizhao Mo et.al. | 2603.12831 | null |
| 2026-03-13 | LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing | Jiawei Hao et.al. | 2603.12645 | null |
| 2026-03-13 | CarPLAN: Context-Adaptive and Robust Planning with Dynamic Scene Awareness for Autonomous Driving | Junyong Yun et.al. | 2603.12607 | null |
| 2026-03-13 | Spectral Dataset of Stripped-Envelope Supernovae from the Tsinghua Supernova Group | Danfeng Xiang et.al. | 2603.12604 | null |
| 2026-03-13 | Expert Pyramid Tuning: Efficient Parameter Fine-Tuning for Expertise-Driven Task Allocation | Jia-Chen Zhang et.al. | 2603.12577 | null |
| 2026-03-13 | Spatio-Semantic Expert Routing Architecture with Mixture-of-Experts for Referring Image Segmentation | Alaa Dalaq et.al. | 2603.12538 | null |
| 2026-03-12 | TaxBreak: Unmasking the Hidden Costs of LLM Inference Through Overhead Decomposition | Prabhu Vellaisamy et.al. | 2603.12465 | null |
| 2026-03-12 | NeuroLoRA: Context-Aware Neuromodulation for Parameter-Efficient Multi-Task Adaptation | Yuxin Yang et.al. | 2603.12378 | null |
| 2026-03-12 | A Two-Stage Dual-Modality Model for Facial Emotional Expression Recognition | Jiajun Sun et.al. | 2603.12221 | null |
| 2026-03-12 | CrossEarth-SAR: A SAR-Centric and Billion-Scale Geospatial Foundation Model for Domain Generalizable Semantic Segmentation | Ziqi Ye et.al. | 2603.12008 | null |
| 2026-03-12 | AdaFuse: Accelerating Dynamic Adapter Inference via Token-Level Pre-Gating and Fused Kernel Optimization | Qiyang Li et.al. | 2603.11873 | null |
| 2026-03-12 | Expert Threshold Routing for Autoregressive Language Modeling with Dynamic Computation Allocation and Load Balancing | Hanchi Sun et.al. | 2603.11535 | null |
| 2026-03-11 | Task-Conditioned Routing Signatures in Sparse Mixture-of-Experts Transformers | Mynampati Sri Ranganadha Avinash et.al. | 2603.11114 | null |
| 2026-03-11 | Sample-and-Search: An Effective Algorithm for Learning-Augmented k-Median Clustering in High dimensions | Kangke Cheng et.al. | 2603.10721 | null |
| 2026-03-11 | UniStitch: Unifying Semantic and Geometric Features for Image Stitching | Yuan Mei et.al. | 2603.10568 | null |
| 2026-03-11 | Optimal Expert-Attention Allocation in Mixture-of-Experts: A Scalable Law for Dynamic Model Design | Junzhuo Li et.al. | 2603.10379 | null |
| 2026-03-12 | The Orthogonal Vulnerabilities of Generative AI Watermarks: A Comparative Empirical Benchmark of Spatial and Latent Provenance | Jesse Yu et.al. | 2603.10323 | null |
| 2026-03-10 | Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions | Mingyang Song et.al. | 2603.09938 | null |
| 2026-03-10 | Quantifying the Necessity of Chain of Thought through Opaque Serial Depth | Jonah Brown-Cohen et.al. | 2603.09786 | null |
| 2026-03-10 | MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants | Zuhao Zhang et.al. | 2603.09652 | null |
| 2026-03-10 | MORE-R1: Guiding LVLM for Multimodal Object-Entity Relation Extraction via Stepwise Reasoning with Reinforcement Learning | Xiang Yuan et.al. | 2603.09478 | null |
| 2026-03-12 | Multi-tasking through quantum annealing | Jargalsaikhan Artag et.al. | 2603.09468 | null |
| 2026-03-10 | Variational Routing: A Scalable Bayesian Framework for Calibrated Mixture-of-Experts Transformers | Albus Yizhuo Li et.al. | 2603.09453 | null |
| 2026-03-10 | Exploring Modality-Aware Fusion and Decoupled Temporal Propagation for Multi-Modal Object Tracking | Shilei Wang et.al. | 2603.09287 | null |
| 2026-03-10 | Acoustic and Semantic Modeling of Emotion in Spoken Language | Soumya Dutta et.al. | 2603.09212 | null |
| 2026-03-10 | GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models | Md Selim Sarowar et.al. | 2603.09079 | null |
| 2026-03-09 | The $qs$ Inequality: Quantifying the Double Penalty of Mixture-of-Experts at Inference | Vignesh Adhinarayanan et.al. | 2603.08960 | null |
| 2026-03-09 | ConFu: Contemplate the Future for Better Speculative Sampling | Zongyue Qin et.al. | 2603.08899 | null |
| 2026-03-09 | Microwave response of electrically driven spins in a three-qubit quantum processor | Tanner M. Janda et.al. | 2603.08577 | null |
| 2026-03-09 | LAR-MoE: Latent-Aligned Routing for Mixture of Experts in Robotic Imitation Learning | Ariel Rodriguez et.al. | 2603.08476 | null |
| 2026-03-09 | Amplitude Analysis of Singly Cabibbo-Suppressed Decay $Λ^{+}_{c}\to p K^{+} K^{-}$ | BESIII Collaboration et.al. | 2603.08469 | null |
| 2026-03-09 | IronEngine: Towards General AI Assistant | Xi Mo et.al. | 2603.08425 | null |
| 2026-03-09 | Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows | Shentong Mo et.al. | 2603.08126 | null |
| 2026-03-09 | An improved measurement of $η^\prime\rightarrow e^{+}e^{-}ω$ | BESIII Collaboration et.al. | 2603.08120 | null |
| 2026-03-09 | SAMoE-VLA: A Scene Adaptive Mixture-of-Experts Vision-Language-Action Model for Autonomous Driving | Zihan You et.al. | 2603.08113 | null |
| 2026-03-09 | Deterministic Differentiable Structured Pruning for Large Language Models | Weiyu Huang et.al. | 2603.08065 | null |
| 2026-03-09 | Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization | Jingwei Li et.al. | 2603.08022 | null |
| 2026-03-09 | Scaling Machine Learning Interatomic Potentials with Mixtures of Experts | Yuzhi Liu et.al. | 2603.07977 | null |
| 2026-03-09 | Structural Design and Performance Analysis of Laser Transmitting Telescope for Space Gravitational Wave Detection | Long Yongtao et.al. | 2603.07967 | null |
| 2026-03-09 | SGG-R $^{\rm 3}$ : From Next-Token Prediction to End-to-End Unbiased Scene Graph Generation | Jiaye Feng et.al. | 2603.07961 | null |
| 2026-03-09 | SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans | Hansi Zeng et.al. | 2603.07853 | null |
| 2026-03-08 | Scalable Training of Mixture-of-Experts Models with Megatron Core | Zijie Yan et.al. | 2603.07685 | null |
| 2026-03-08 | AtomicVLA: Unlocking the Potential of Atomic Skill Learning in Robots | Likui Zhang et.al. | 2603.07648 | null |
| 2026-03-08 | Mixed Effects Mixture of Experts: Modeling Double Heterogeneous Trajectories | Xinkai Yue et.al. | 2603.07479 | null |
| 2026-03-08 | UnSCAR: Universal, Scalable, Controllable, and Adaptable Image Restoration | Debabrata Mandal et.al. | 2603.07406 | null |
| 2026-03-07 | Scheduling Parallel Optical Circuit Switches for AI Training | Kevin Liang et.al. | 2603.07373 | null |
| 2026-03-07 | Mozart: Modularized and Efficient MoE Training on 3.5D Wafer-Scale Chiplet Architectures | Shuqing Luo et.al. | 2603.07006 | null |
| 2026-03-06 | Swimba: Switch Mamba Model Scales State Space Models | Zhixu Du et.al. | 2603.06938 | null |
| 2026-03-06 | PaQ-DETR: Learning Pattern and Quality-Aware Dynamic Queries for Object Detection | Zhengjian Kang et.al. | 2603.06917 | null |
| 2026-03-06 | RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering | Gaia A. Bertolino et.al. | 2603.06542 | null |
| 2026-03-06 | A Mixture-of-Experts Framework for Practical Hybrid-Quantum Models in Credit Card Fraud Detection | Rodrigo Chaves et.al. | 2603.06473 | null |
| 2026-03-06 | MoEMambaMIL: Structure-Aware Selective State Space Modeling for Whole-Slide Image Analysis | Dongqing Xie et.al. | 2603.06378 | null |
| 2026-03-06 | MoEless: Efficient MoE LLM Serving via Serverless Computing | Hanfei Yu et.al. | 2603.06350 | null |
| 2026-03-06 | WMoE-CLIP: Wavelet-Enhanced Mixture-of-Experts Prompt Learning for Zero-Shot Anomaly Detection | Peng Chen et.al. | 2603.06313 | null |
| 2026-03-06 | GazeMoE: Perception of Gaze Target with Mixture-of-Experts | Zhuangzhuang Dai et.al. | 2603.06256 | null |
| 2026-03-06 | EvoESAP: Non-Uniform Expert Pruning for Sparse MoE | Zongfang Liu et.al. | 2603.06003 | null |
| 2026-03-06 | MoE Lens – An Expert Is All You Need | Marmik Chaudhari et.al. | 2603.05806 | null |
| 2026-03-06 | Sparse Crosscoders for diffing MoEs and Dense models | Marmik Chaudhari et.al. | 2603.05805 | null |
| 2026-03-05 | Change Point Detection for Cell Populations Measured via Flow Cytometry | Yik Lun Kei et.al. | 2603.05700 | null |
| 2026-03-05 | FreeTxt-Vi: A Benchmarked Vietnamese-English Toolkit for Segmentation, Sentiment, and Summarisation | Hung Nguyen Huy et.al. | 2603.05690 | null |
| 2026-03-05 | Multi-channel joint analysis of the exotic charmonium-like state $T_{c\bar{c}}(4020)$ | BESIII Collaboration et.al. | 2603.05564 | null |
| 2026-03-05 | VietJobs: A Vietnamese Job Advertisement Dataset | Hieu Pham Dinh et.al. | 2603.05262 | null |
| 2026-03-05 | NeuronMoE: Neuron-Guided Mixture-of-Experts for Efficient Multilingual LLM Extension | Rongzhi Li et.al. | 2603.05046 | null |
| 2026-03-05 | Mixture of Universal Experts: Scaling Virtual Width via Depth-Width Transformation | Yilong Chen et.al. | 2603.04971 | null |
| 2026-03-05 | Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling | Yong Liu et.al. | 2603.04791 | null |
| 2026-03-05 | TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings | Yebo Wu et.al. | 2603.04772 | null |
| 2026-03-04 | ECG-MoE: Mixture-of-Expert Electrocardiogram Foundation Model | Yuhao Xu et.al. | 2603.04589 | null |
| 2026-03-04 | Augmenting representations with scientific papers | Nicolò Oreste Pinciroli Vago et.al. | 2603.04516 | null |
| 2026-03-04 | RANGER: Sparsely-Gated Mixture-of-Experts with Adaptive Retrieval Re-ranking for Pathology Report Generation | Yixin Chen et.al. | 2603.04348 | null |
| 2026-03-04 | CAMMSR: Category-Guided Attentive Mixture of Experts for Multimodal Sequential Recommendation | Jinfeng Xu et.al. | 2603.04320 | null |
| 2026-03-04 | Precise measurement of the form factors in $D^0\rightarrow K^(892)^-\ell^+ν_{\ell}$ and observation of $D^0\rightarrow K_2^(1430)^-\ell^+ν_{\ell}$ | BESIII Collaboration et.al. | 2603.04136 | null |
| 2026-03-04 | UniRain: Unified Image Deraining with RAG-based Dataset Distillation and Multi-objective Reweighted Optimization | Qianfeng Yang et.al. | 2603.03967 | null |
| 2026-03-04 | Glass Segmentation with Fusion of Learned and General Visual Features | Risto Ojala et.al. | 2603.03718 | null |
| 2026-03-04 | Plasmonic polaron in self-intercalated 1T-TiS2 | Byoung Ki Choi et.al. | 2603.03663 | null |
| 2026-03-03 | Modeling Cross-vision Synergy for Unified Large Vision Model | Shengqiong Wu et.al. | 2603.03564 | null |
| 2026-03-03 | Beyond Language Modeling: An Exploration of Multimodal Pretraining | Shengbang Tong et.al. | 2603.03276 | null |
| 2026-03-03 | Search for a massless particle beyond the Standard Model in the $Ξ^0\toΛ+ \text{invisible}$ decay | BESIII Collaboration et.al. | 2603.03199 | null |
| 2026-03-04 | MoECLIP: Patch-Specialized Experts for Zero-shot Anomaly Detection | Jun Yeong Park et.al. | 2603.03101 | null |
| 2026-03-03 | CMoE: Contrastive Mixture of Experts for Motion Control and Terrain Adaptation of Humanoid Robots | Shihao Ma et.al. | 2603.03067 | null |
| 2026-03-03 | EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education | Baoliang Chen et.al. | 2603.03066 | null |
| 2026-03-03 | Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs | Wuyue Zhang et.al. | 2603.02731 | null |
| 2026-03-03 | TenExp: Mixture-of-Experts-Based Tensor Decomposition Structure Search Framework | Ting-Wei Zhou et.al. | 2603.02720 | null |
| 2026-03-03 | MiM-DiT: MoE in MoE with Diffusion Transformers for All-in-One Image Restoration | Lingshun Kong et.al. | 2603.02710 | null |
| 2026-03-03 | Addressing Missing and Noisy Modalities in One Solution: Unified Modality-Quality Framework for Low-quality Multimodal Data | Sijie Mai et.al. | 2603.02695 | null |
| 2026-03-03 | Robust Heterogeneous Analog-Digital Computing for Mixture-of-Experts Models with Theoretical Generalization Guarantees | Mohammed Nowaz Rabbani Chowdhury et.al. | 2603.02633 | null |
| 2026-03-02 | Search for the charmonium weak decay $ψ(2S)\to D_s^-π^+ + c.c.$ and $ψ(2S)\to D_s^-ρ^+ + c.c.$ | BESIII Collaboration et.al. | 2603.01777 | null |
| 2026-03-02 | DynaMoE: Dynamic Token-Level Expert Activation with Layer-Wise Adaptive Capacity for Mixture-of-Experts Neural Networks | Gökdeniz Gülmez et.al. | 2603.01697 | null |
| 2026-03-02 | PathMoE: Interpretable Multimodal Interaction Experts for Pediatric Brain Tumor Classification | Jian Yu et.al. | 2603.01547 | null |
| 2026-03-02 | Multimodal Mixture-of-Experts with Retrieval Augmentation for Protein Active Site Identification | Jiayang Wu et.al. | 2603.01511 | null |
| 2026-03-02 | DOCFORGE-BENCH: A Comprehensive Benchmark for Document Forgery Detection and Analysis | Zengqi Zhao et.al. | 2603.01433 | null |
| 2026-03-03 | UETrack: A Unified and Efficient Framework for Single Object Tracking | Ben Kang et.al. | 2603.01412 | null |
| 2026-03-02 | Fed-GAME: Personalized Federated Learning with Graph Attention Mixture-of-Experts For Time-Series Forecasting | Yi Li et.al. | 2603.01363 | null |
| 2026-03-01 | Truth as a Trajectory: What Internal Representations Reveal About Large Language Model Reasoning | Hamed Damirchi et.al. | 2603.01326 | null |
| 2026-03-01 | Fast Confidence-Aware Human Prediction via Hardware-accelerated Bayesian Inference for Safe Robot Navigation | Michael Lu et.al. | 2603.01122 | null |
| 2026-03-01 | TriMoE: Augmenting GPU with AMX-Enabled CPU and DIMM-NDP for High-Throughput MoE Inference via Offloading | Yudong Pan et.al. | 2603.01058 | null |
| 2026-03-01 | Dr.Occ: Depth- and Region-Guided 3D Occupancy from Surround-View Cameras for Autonomous Driving | Xubo Zhu et.al. | 2603.01007 | null |
| 2026-02-28 | MME: Mixture of Mesh Experts with Random Walk Transformer Gating | Amir Belder et.al. | 2603.00828 | null |
| 2026-02-28 | First Amplitude Analysis of $D^0\rightarrow K^-π^0e^+ν_e$ and Observation of $D^0\rightarrow K^*_2(1430)^-e^+ν_e$ | BESIII Collaboration et.al. | 2603.00743 | null |
| 2026-02-28 | K^2-Agent: Co-Evolving Know-What and Know-How for Hierarchical Mobile Device Control | Zhe Wu et.al. | 2603.00676 | null |
| 2026-02-28 | Precise Measurement and Control of Radon Progeny on Detector Surfaces | C. B. Z. Luo et.al. | 2603.00647 | null |
| 2026-02-28 | CoMoL: Efficient Mixture of LoRA Experts via Dynamic Core Space Merging | Jie Cao et.al. | 2603.00573 | null |
| 2026-02-27 | CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning | Yuxuan Liu et.al. | 2602.24142 | null |
| 2026-02-27 | Precision Studies and Searches for CP Asymmetries in the Inclusive Decay $Λ_{c}^{+}\to ΛX$ | BESIII Collaboration et.al. | 2602.24089 | null |
| 2026-02-27 | Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization | Chenwei Jia et.al. | 2602.24059 | null |
| 2026-02-27 | Measurement of Born Cross Sections for $e^+e^-\toΣ^-\barΣ^+$ at $\sqrt{s}=3.51-4.95$ GeV and Observation of $ψ(3770)\toΣ^-\barΣ^+$ | BESIII Collaboration et.al. | 2602.23835 | null |
| 2026-02-27 | ProductResearch: Training E-Commerce Deep Research Agents via Multi-Agent Synthetic Trajectory Distillation | Jiangyuan Wang et.al. | 2602.23716 | null |
| 2026-02-26 | Brain-OF: An Omnifunctional Foundation Model for fMRI, EEG and MEG | Hanning Guo et.al. | 2602.23410 | null |
| 2026-02-26 | A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations | Soumya Dutta et.al. | 2602.23300 | null |
| 2026-02-26 | Learning Physical Operators using Neural Operators | Vignesh Gopakumar et.al. | 2602.23113 | null |
| 2026-02-26 | Residual Koopman Spectral Profiling for Predicting and Preventing Transformer Training Instability | Bum Jun Kim et.al. | 2602.22988 | null |
| 2026-02-26 | pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation | Shentong Mo et.al. | 2602.22938 | null |
| 2026-02-26 | MEDNA-DFM: A Dual-View FiLM-MoE Model for Explainable DNA Methylation Prediction | Yi He et.al. | 2602.22850 | null |
| 2026-02-26 | DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation | Hao Zheng et.al. | 2602.22839 | null |
| 2026-02-26 | Productivity and Collaboration in Hybrid Agile Teams: An Interview Study | Elisabeth Mo et.al. | 2602.22835 | null |
| 2026-02-26 | Measurements of branching fractions of $Λ_{c}^{+}\toΣ^{0}K_{S}^{0}π^{+}$ and $Λ_{c}^{+}\toΣ^{0}K_{S}^{0}K^{+}$ | BESIII Collaboration et.al. | 2602.22754 | null |
| 2026-02-26 | IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation | Yanpei Guo et.al. | 2602.22700 | null |
| 2026-02-26 | Switch-Hurdle: A MoE Encoder with AR Hurdle Decoder for Intermittent Demand Forecasting | Fabian Muşat et.al. | 2602.22685 | null |
| 2026-02-26 | Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement | Shuchen Zhu et.al. | 2602.22681 | null |
| 2026-02-26 | Predictive variational inference for flexible regression models | Lucas Kock et.al. | 2602.22582 | null |
| 2026-02-26 | Towards Dynamic Dense Retrieval with Routing Strategy | Zhan Su et.al. | 2602.22547 | null |
| 2026-02-25 | NESTOR: A Nested MOE-based Neural Operator for Large-Scale PDE Pre-Training | Dengdi Sun et.al. | 2602.22059 | null |
| 2026-02-25 | Excitation: Momentum For Experts | Sagi Shaier et.al. | 2602.21798 | null |
| 2026-02-25 | Learning from Yesterday’s Error: An Efficient Online Learning Method for Traffic Demand Prediction | Xiannan Huang et.al. | 2602.21757 | null |
| 2026-02-25 | TiMi: Empower Time Series Transformers with Multimodal Mixture of Experts | Jiafeng Lin et.al. | 2602.21693 | null |
| 2026-02-25 | Multi-Layer Scheduling for MoE-Based LLM Reasoning | Yifan Sun et.al. | 2602.21626 | null |
| 2026-02-24 | A Path to an All-Sky Survey with Roman | Jiwon Jesse Han et.al. | 2602.21280 | null |
| 2026-02-24 | On infinite sets with no $3$ on a line | Moe Putterman et.al. | 2602.21275 | null |
| 2026-02-24 | ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments | Haley Li et.al. | 2602.21140 | null |
| 2026-02-24 | MUSE: Harnessing Precise and Diverse Semantics for Few-Shot Whole Slide Image Classification | Jiahao Xu et.al. | 2602.20873 | null |
| 2026-02-25 | GeCo-SRT: Geometry-aware Continual Adaptation for Robotic Cross-Task Sim-to-Real Transfer | Wenbo Yu et.al. | 2602.20871 | null |
| 2026-02-24 | Multi-time Loewner energy: rate function for large deviation | Mo Chen et.al. | 2602.20642 | null |
| 2026-02-24 | Precise Measurement of Matter-Antimatter Asymmetry with Entangled Hyperon Antihyperon Pairs | BESIII Collaboration et.al. | 2602.20524 | null |
| 2026-02-24 | Search for Light-Mass Fractionally Charged Particles in Space with DAMPE Experiment | F. Alemanno et.al. | 2602.20519 | null |
| 2026-02-24 | Wireless Federated Multi-Task LLM Fine-Tuning via Sparse-and-Orthogonal LoRA | Nuocheng Yang et.al. | 2602.20492 | null |
| 2026-02-23 | Learning Discriminative and Generalizable Anomaly Detector for Dynamic Graph with Limited Supervision | Yuxing Tian et.al. | 2602.20019 | null |
| 2026-02-23 | Counterfactual Understanding via Retrieval-aware Multimodal Modeling for Time-to-Event Survival Prediction | Ha-Anh Hoang Nguyen et.al. | 2602.19987 | null |
| 2026-02-23 | ReAttn: Improving Attention-based Re-ranking via Attention Re-weighting | Yuxing Tian et.al. | 2602.19969 | null |
| 2026-02-23 | A Replicate-and-Quantize Strategy for Plug-and-Play Load Balancing of Sparse Mixture-of-Experts LLMs | Zijie Liu et.al. | 2602.19938 | null |
| 2026-02-23 | Towards Dexterous Embodied Manipulation via Deep Multi-Sensory Fusion and Sparse Expert Scaling | Yirui Sun et.al. | 2602.19764 | null |
| 2026-02-23 | Multimodal Dataset Distillation Made Simple by Prototype-Guided Data Synthesis | Junhyeok Choi et.al. | 2602.19756 | null |
| 2026-02-23 | RAID: Retrieval-Augmented Anomaly Detection | Mingxiu Cai et.al. | 2602.19611 | null |
| 2026-02-23 | EMS-FL: Federated Tuning of Mixture-of-Experts in Satellite-Terrestrial Networks via Expert-Driven Model Splitting | Angzi Xu et.al. | 2602.19485 | null |
| 2026-02-22 | RegionRoute: Regional Style Transfer with Diffusion Model | Bowen Chen et.al. | 2602.19254 | null |
| 2026-02-22 | Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts | Toshihide Ubukata et.al. | 2602.19244 | null |
| 2026-02-22 | SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation | Yujie Lu et.al. | 2602.19213 | null |
| 2026-02-22 | JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation | Kai Liu et.al. | 2602.19163 | null |
| 2026-02-22 | K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model | Shiyi Cao et.al. | 2602.19128 | null |
| 2026-02-22 | Routing-Aware Explanations for Mixture of Experts Graph Models in Malware Detection | Hossein Shokouhinejad et.al. | 2602.19025 | null |
| 2026-02-21 | NeuroWise: A Multi-Agent LLM “Glass-Box” System for Practicing Double-Empathy Communication with Autistic Partners | Albert Tang et.al. | 2602.18962 | null |
| 2026-02-21 | Give Users the Wheel: Towards Promptable Recommendation Paradigm | Fuyuan Lyu et.al. | 2602.18929 | null |
| 2026-02-21 | Diverse properties of electron Forbush decreases revealed by the Dark Matter Particle Explorer | F. Alemanno et.al. | 2602.18743 | null |
| 2026-02-21 | Comprehensive measurement of $η^\prime$ photoproduction off the proton at $E_γ< 2.4$ $\mathrm{GeV}$ | N. Muramatsu et.al. | 2602.18675 | null |
| 2026-02-20 | Going Down Memory Lane: Scaling Tokens for Video Stream Understanding with Dynamic KV-Cache Memory | Vatsal Agarwal et.al. | 2602.18434 | null |
| 2026-02-20 | RamanSeg: Interpretability-driven Deep Learning on Raman Spectra for Cancer Diagnosis | Chris Tomy et.al. | 2602.18119 | null |
| 2026-02-20 | DeepSVU: Towards In-depth Security-oriented Video Understanding via Unified Physical-world Regularized MoE | Yujie Jin et.al. | 2602.18019 | null |
| 2026-02-19 | Grassmannian Mixture-of-Experts: Concentration-Controlled Routing on Subspace Manifolds | Ibne Farabi Shihab et.al. | 2602.17798 | null |
| 2026-02-19 | Phase-Aware Mixture of Experts for Agentic Reinforcement Learning | Shengtian Yang et.al. | 2602.17038 | null |
| 2026-02-19 | Arcee Trinity Large Technical Report | Varun Singh et.al. | 2602.17004 | null |
| 2026-02-19 | Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation | Yan Wang et.al. | 2602.16990 | null |
| 2026-02-18 | Claim Automation using Large Language Model | Zhengda Mo et.al. | 2602.16836 | null |
| 2026-02-18 | Efficient Tail-Aware Generative Optimization via Flow Model Fine-Tuning | Zifan Wang et.al. | 2602.16796 | null |
| 2026-02-18 | Geometric Neural Operators via Lie Group-Constrained Latent Dynamics | Jiaquan Zhang et.al. | 2602.16209 | null |
| 2026-02-18 | OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis | Tianwei Lin et.al. | 2602.16110 | null |
| 2026-02-18 | Federated Graph AGI for Cross-Border Insider Threat Intelligence in Government Financial Schemes | Srikumar Nayak et.al. | 2602.16109 | null |
| 2026-02-17 | MoE-Spec: Expert Budgeting for Efficient Speculative Decoding | Bradley McDanel et.al. | 2602.16052 | null |
| 2026-02-17 | ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns | Ziyu Zhao et.al. | 2602.15521 | null |
| 2026-02-17 | GMAIL: Generative Modality Alignment for generated Image Learning | Shentong Mo et.al. | 2602.15368 | null |
| 2026-02-16 | Mixture-of-Experts under Finite-Rate Gating: Communication–Generalization Trade-offs | Ali Khalesi et.al. | 2602.15091 | null |
| 2026-02-13 | RynnBrain: Open Embodied Foundation Models | Ronghao Dang et.al. | 2602.14979 | null |
| 2026-02-16 | Topological and arithmetic characteristics about products of projective lines with complex tori | Jia-Li Mo et.al. | 2602.14745 | null |
| 2026-02-16 | DriveFine: Refining-Augmented Masked Diffusion VLA for Precise and Robust Driving | Chenxu Dang et.al. | 2602.14577 | null |
| 2026-02-15 | DeepFusion: Accelerating MoE Training via Federated Knowledge Distillation from Heterogeneous Edge Devices | Songyuan Li et.al. | 2602.14301 | null |
| 2026-02-15 | MILD: Multi-Intent Learning and Disambiguation for Proactive Failure Prediction in Intent-based Networking | Md. Kamrul Hossain et.al. | 2602.14283 | null |
| 2026-02-15 | Multi-Agent Debate: A Unified Agentic Framework for Tabular Anomaly Detection | Pinqiao Wang et.al. | 2602.14251 | null |
| 2026-02-15 | Fast Catch-Up, Late Switching: Optimal Batch Size Scheduling via Functional Scaling Laws | Jinbo Wang et.al. | 2602.14208 | null |
| 2026-02-15 | Synergistic Intra- and Cross-Layer Regularization Losses for MoE Expert Specialization | Rizhen Hu et.al. | 2602.14159 | null |
| 2026-02-15 | REAL: Resolving Knowledge Conflicts in Knowledge-Intensive Visual Question Answering via Reasoning-Pivot Alignment | Kai Ye et.al. | 2602.14065 | null |
| 2026-02-15 | LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts | Yang Liu et.al. | 2602.14060 | null |
| 2026-02-15 | Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models | Sajjad Kachuee et.al. | 2602.14039 | null |
| 2026-02-15 | Eureka-Audio: Triggering Audio Intelligence in Compact Language Models | Dan Zhang et.al. | 2602.13954 | null |
| 2026-02-14 | Assessing Cybersecurity Risks and Traffic Impact in Connected Autonomous Vehicles | Saurav Silwal et.al. | 2602.13898 | null |
| 2026-02-14 | Mixture-of-experts Wishart model for covariance matrices with an application to Cancer drug screening | The Tien Mai et.al. | 2602.13888 | null |
| 2026-02-13 | Dyad: a binary-star dynamics and statistics library for Python | Amery Gration et.al. | 2602.13388 | null |
| 2026-02-13 | Improved measurements of the coherence factors and strong-phase differences in $D\to K^-π^+π^+π^-$ and $D\to K^-π^+π^0$ with quantum-correlated $D\bar{D}$ decays | BESIII Collaboration et.al. | 2602.13002 | null |
| 2026-02-13 | Aspect-Based Sentiment Analysis for Future Tourism Experiences: A BERT-MoE Framework for Persian User Reviews | Hamidreza Kazemi Taskooh et.al. | 2602.12778 | null |
| 2026-02-13 | Mixture of Predefined Experts: Maximizing Data Usage on Vertical Federated Learning | Jon Irureta et.al. | 2602.12708 | null |
| 2026-02-13 | Multi-Head Attention as a Source of Catastrophic Forgetting in MoE Transformers | Anrui Chen et.al. | 2602.12587 | null |
| 2026-02-13 | SD-MoE: Spectral Decomposition for Effective Expert Specialization | Ruijun Huang et.al. | 2602.12556 | null |
| 2026-02-13 | Decoder-only Conformer with Modality-aware Sparse Mixtures of Experts for ASR | Jaeyoung Lee et.al. | 2602.12546 | null |
| 2026-02-12 | Query-focused and Memory-aware Reranker for Long Context Processing | Yuqing Li et.al. | 2602.12192 | null |
| 2026-02-12 | Measurement of the singly Cabibbo-suppressed decay $Λ_c^+\to pη’$ with Deep Learning | BESIII Collaboration et.al. | 2602.11974 | null |
| 2026-02-12 | Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration | Akhiad Bercovich et.al. | 2602.11937 | null |
| 2026-02-12 | Deep Kernel Fusion for Transformers | Zixi Zhang et.al. | 2602.11808 | null |
| 2026-02-12 | LAER-MoE: Load-Adaptive Expert Re-layout for Efficient Mixture-of-Experts Training | Xinyi Liu et.al. | 2602.11686 | null |
| 2026-02-12 | Evolutionary Router Feature Generation for Zero-Shot Graph Anomaly Detection with Mixture-of-Experts | Haiyang Jiang et.al. | 2602.11622 | null |
| 2026-02-12 | Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm | Jinrui Zhang et.al. | 2602.11543 | null |
| 2026-02-12 | Adaptive Milestone Reward for GUI Agents | Congmin Zheng et.al. | 2602.11524 | null |
| 2026-02-12 | Observation of a New Excited $Σ$ State in $ψ(3686)\to\bar{p}K^+Σ^0+c.c.$ | BESIII Collaboration et.al. | 2602.11501 | null |
| 2026-02-11 | Charting Empirical Laws for LLM Fine-Tuning in Scientific Multi-Discipline Learning | Lintao Wang et.al. | 2602.11215 | null |
| 2026-02-11 | MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs | Yupu Gu et.al. | 2602.10965 | null |
| 2026-02-11 | CMAD: Cooperative Multi-Agent Diffusion via Stochastic Optimal Control | Riccardo Barbano et.al. | 2602.10933 | null |
| 2026-02-11 | VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training | Guobin Shen et.al. | 2602.10693 | null |
| 2026-02-11 | Multimodal Priors-Augmented Text-Driven 3D Human-Object Interaction Generation | Yin Wang et.al. | 2602.10659 | null |
| 2026-02-11 | A Vision-Language Foundation Model for Zero-shot Clinical Collaboration and Automated Concept Discovery in Dermatology | Siyuan Yan et.al. | 2602.10624 | null |
| 2026-02-11 | Supercharging Packet-level Network Simulation of Large Model Training via Memoization and Fast-Forwarding | Fei Long et.al. | 2602.10615 | null |
| 2026-02-11 | Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters | Ailin Huang et.al. | 2602.10604 | null |
| 2026-02-11 | Neural Additive Experts: Context-Gated Experts for Controllable Model Additivity | Guangzhi Xiong et.al. | 2602.10585 | null |
| 2026-02-12 | 3DXTalker: Unifying Identity, Lip Sync, Emotion, and Spatial Dynamics in Expressive 3D Talking Avatars | Zhongju Wang et.al. | 2602.10516 | null |
| 2026-02-10 | Area-Efficient In-Memory Computing for Mixture-of-Experts via Multiplexing and Caching | Hanyuan Gao et.al. | 2602.10254 | null |
| 2026-02-10 | TDE 2025abcr: A Tidal Disruption Event in the Outskirts of a Massive Galaxy | Robert Stein et.al. | 2602.10180 | null |
| 2026-02-10 | MalMoE: Mixture-of-Experts Enhanced Encrypted Malicious Traffic Detection Under Graph Drift | Yunpeng Tan et.al. | 2602.10157 | null |
| 2026-02-10 | Diverse Skill Discovery for Quadruped Robots via Unsupervised Learning | Ruopeng Cui et.al. | 2602.09767 | null |
| 2026-02-10 | Revealing the Challenges of Attention-FFN Disaggregation for Modern MoE Models and Hardware Systems | Guowei Liu et.al. | 2602.09721 | null |
| 2026-02-10 | First observation of the $η_{c}\toΞ^{0} \barΞ^{0}$ decay | BESIII Collaboration et.al. | 2602.09652 | null |
| 2026-02-10 | DR.Experts: Differential Refinement of Distortion-Aware Experts for Blind Image Quality Assessment | Bohan Fu et.al. | 2602.09531 | null |
| 2026-02-10 | SMES: Towards Scalable Multi-Task Recommendation via Expert Sparsity | Yukun Zhang et.al. | 2602.09386 | null |
| 2026-02-10 | Effective MoE-based LLM Compression by Exploiting Heterogeneous Inter-Group Experts Routing Frequency and Information Density | Zhendong Mi et.al. | 2602.09316 | null |
| 2026-02-09 | Generalizing GNNs with Tokenized Mixture of Experts | Xiaoguang Guo et.al. | 2602.09258 | null |
| 2026-02-09 | UI-Venus-1.5 Technical Report | Veuns-Team et.al. | 2602.09082 | null |
| 2026-02-09 | DirMoE: Dirichlet-routed Mixture of Experts | Amirhossein Vahidi et.al. | 2602.09001 | null |
| 2026-02-09 | OmniReview: A Large-scale Benchmark and LLM-enhanced Framework for Realistic Reviewer Recommendation | Yehua Huang et.al. | 2602.08896 | null |
| 2026-02-09 | FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models | Annemette Brok Pirchert et.al. | 2602.08818 | null |
| 2026-02-10 | MOVA: Towards Scalable and Synchronized Video-Audio Generation | SII-OpenMOSS Team et.al. | 2602.08794 | null |
| 2026-02-10 | Redundancy-Free View Alignment for Multimodal Human Activity Recognition with Arbitrarily Missing Views | Duc-Anh Nguyen et.al. | 2602.08755 | null |
| 2026-02-09 | Large Language Lobotomy: Jailbreaking Mixture-of-Experts via Expert Silencing | Jona te Lintelo et.al. | 2602.08741 | null |
| 2026-02-09 | 6G-Bench: An Open Benchmark for Semantic Communication and Network-Level Reasoning with Foundation Models in AI-Native 6G Networks | Mohamed Amine Ferrag et.al. | 2602.08675 | null |
| 2026-02-10 | Fundamental Reasoning Paradigms Induce Out-of-Domain Generalization in Language Models | Mingzi Cao et.al. | 2602.08658 | null |
| 2026-02-09 | Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs | Yukun Jiang et.al. | 2602.08621 | null |
| 2026-02-09 | Giant Magnetocaloric Effect in a High-Spin Shastry-Sutherland Dipolar Magnet | Jianjian Gong et.al. | 2602.08497 | null |
| 2026-02-09 | TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration | Linye Wei et.al. | 2602.08404 | null |
| 2026-02-09 | Tighnari v2: Mitigating Label Noise and Distribution Shift in Multimodal Plant Distribution Prediction via Mixture of Experts and Weakly Supervised Learning | Haixu Liu et.al. | 2602.08282 | null |
| 2026-02-09 | Large Language Models in Peer-Run Community Behavioral Health Services: Understanding Peer Specialists and Service Users’ Perspectives on Opportunities, Risks, and Mitigation Strategies | Cindy Peng et.al. | 2602.08187 | null |
| 2026-02-08 | Multimodal normative modeling in Alzheimers Disease with introspective variational autoencoders | Sayantan Kumar et.al. | 2602.08077 | null |
| 2026-02-08 | Efficient and Adaptable Detection of Malicious LLM Prompts via Bootstrap Aggregation | Shayan Ali Hassan et.al. | 2602.08062 | null |
| 2026-02-08 | Enhanced Mixture 3D CGAN for Completion and Generation of 3D Objects | Yahia Hamdi et.al. | 2602.08046 | null |
| 2026-02-08 | The Rise of Sparse Mixture-of-Experts: A Survey from Algorithmic Foundations to Decentralized Architectures and Vertical Domain Applications | Dong Pan et.al. | 2602.08019 | null |
| 2026-02-08 | Fast Model Selection and Stable Optimization for Softmax-Gated Multinomial-Logistic Mixture of Experts Models | TrungKhang Tran et.al. | 2602.07997 | null |
| 2026-02-08 | Thinking in Structures: Evaluating Spatial Intelligence through Reasoning on Constrained Manifolds | Chen Yang et.al. | 2602.07864 | null |
| 2026-02-07 | SERE: Similarity-based Expert Re-routing for Efficient Batch Decoding in MoE Models | Juntong Wu et.al. | 2602.07616 | null |
| 2026-02-06 | DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos | Shenyuan Gao et.al. | 2602.06949 | null |
| 2026-02-06 | Parameters as Experts: Adapting Vision Models with Dynamic Parameter Routing | Meng Lou et.al. | 2602.06862 | null |
| 2026-02-06 | POP: Online Structural Pruning Enables Efficient Inference of Large Foundation Models | Yi Chen et.al. | 2602.06822 | null |
| 2026-02-06 | SaDiT: Efficient Protein Backbone Design via Latent Structural Tokenization and Diffusion Transformers | Shentong Mo et.al. | 2602.06706 | null |
| 2026-02-06 | Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making | Baichuan-M3 Team et.al. | 2602.06570 | null |
| 2026-02-06 | TokenMixer-Large: Scaling Up Large Ranking Models in Industrial Recommenders | Yuchen Jiang et.al. | 2602.06563 | null |
| 2026-02-06 | HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and Reduction | Shengxuan Qiu et.al. | 2602.06527 | null |
| 2026-02-05 | GRP-Obliteration: Unaligning LLMs With a Single Unlabeled Prompt | Mark Russinovich et.al. | 2602.06258 | null |
| 2026-02-05 | To 2:4 Sparsity and Beyond: Neuron-level Activation Function to Accelerate LLM Pre-Training | Meghana Madhyastha et.al. | 2602.06183 | null |
| 2026-02-05 | MoSE: Mixture of Slimmable Experts for Efficient and Adaptive Language Models | Nurbek Tastan et.al. | 2602.06154 | null |
| 2026-02-05 | OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale | Jingze Shi et.al. | 2602.05711 | null |
| 2026-02-05 | Hidden simplicity in AdS spinning Mellin amplitudes via scaffolding | Song He et.al. | 2602.05568 | null |
| 2026-02-05 | M $^2$ -Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data Mining | Rui Lv et.al. | 2602.05429 | null |
| 2026-02-05 | Mergers Drive Structural Complexity but Not Starbursts in Lyman- $α$ Emitters at $3 < z < 4$ : A JWST Spatially Resolved View | Qi Song et.al. | 2602.05411 | null |
| 2026-02-05 | Decision-Focused Sequential Experimental Design: A Directional Uncertainty-Guided Approach | Beichen Wan et.al. | 2602.05340 | null |
| 2026-02-05 | Surgery: Mitigating Harmful Fine-Tuning for Large Language Models via Attention Sink | Guozhi Liu et.al. | 2602.05228 | null |
| 2026-02-04 | Rule-Based Spatial Mixture-of-Experts U-Net for Explainable Edge Detection | Bharadwaj Dogga et.al. | 2602.05100 | null |
| 2026-02-04 | Multi-Head LatentMoE and Head Parallel: Communication-Efficient and Deterministic MoE Parallelism | Chenwei Cui et.al. | 2602.04870 | null |
| 2026-02-04 | PDF-HR: Pose Distance Fields for Humanoid Robots | Yi Gu et.al. | 2602.04851 | null |
| 2026-02-04 | ERNIE 5.0 Technical Report | Haifeng Wang et.al. | 2602.04705 | null |
| 2026-02-04 | Let Experts Feel Uncertainty: A Multi-Expert Label Distribution Approach to Probabilistic Time Series Forecasting | Zhen Zhou et.al. | 2602.04678 | null |
| 2026-02-04 | RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models | Jiacheng Liang et.al. | 2602.04448 | null |
| 2026-02-04 | Mixture of Masters: Sparse Chess Language Models with Player Routing | Giacomo Frisoni et.al. | 2602.04447 | null |
| 2026-02-04 | Study of $\barΛ$-$p$ Annihilation into Light Mesons | BESIII Collaboration et.al. | 2602.04276 | null |
| 2026-02-04 | Universal Quantized Berry-Dipole Flat Bands | Qingyang Mo et.al. | 2602.04194 | null |
| 2026-02-04 | OMG-Agent: Toward Robust Missing Modality Generation with Decoupled Coarse-to-Fine Agentic Workflows | Ruiting Dai et.al. | 2602.04144 | null |
| 2026-02-04 | Expert Selections In MoE Models Reveal (Almost) As Much As Text | Amir Nuriyev et.al. | 2602.04105 | null |
| 2026-02-03 | SpecMD: A Comprehensive Study On Speculative Expert Prefetching | Duc Hoang et.al. | 2602.03921 | null |
| 2026-02-03 | UniGeM: Unifying Data Mixing and Selection via Geometric Exploration and Mining | Changhao Wang et.al. | 2602.03772 | null |
| 2026-02-03 | HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing | Yizhao Gao et.al. | 2602.03560 | null |
| 2026-02-03 | DALI: A Workload-Aware Offloading Framework for Efficient MoE Inference on Local PCs | Zeyu Zhu et.al. | 2602.03495 | null |
| 2026-02-03 | Scaling Continual Learning with Bi-Level Routing Mixture-of-Experts | Meng Lou et.al. | 2602.03473 | null |
| 2026-02-03 | VIRAL: Visual In-Context Reasoning via Analogy in Diffusion Transformers | Zhiwen Li et.al. | 2602.03210 | null |
| 2026-02-03 | Sparsity is Combinatorial Depth: Quantifying MoE Expressivity via Tropical Geometry | Ye Su et.al. | 2602.03204 | null |
| 2026-02-03 | Aligning Forest and Trees in Images and Long Captions for Visually Grounded Understanding | Byeongju Woo et.al. | 2602.02977 | null |
| 2026-02-02 | Decision-Focused Optimal Transport | Suhan Liu et.al. | 2602.02800 | null |
| 2026-02-02 | Loss mechanisms of microwave frequency acoustic waves in thin film lithium niobate | Qixuan Lin et.al. | 2602.02797 | null |
| 2026-02-02 | SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning | Qifan Yu et.al. | 2602.02472 | null |
| 2026-02-02 | Certain Head, Uncertain Tail: Expert-Sample for Test-Time Scaling in Fine-Grained MoE | Yuanteng Chen et.al. | 2602.02443 | null |
| 2026-02-02 | DFKI-Speech System for WildSpoof Challenge: A robust framework for SASV In-the-Wild | Arnab Das et.al. | 2602.02286 | null |
| 2026-02-02 | MoLF: Mixture-of-Latent-Flow for Pan-Cancer Spatial Gene Expression Prediction from Histology | Susu Hu et.al. | 2602.02282 | null |
| 2026-02-02 | Kimi K2.5: Visual Agentic Intelligence | Kimi Team et.al. | 2602.02276 | null |
| 2026-02-02 | vLLM-Omni: Fully Disaggregated Serving for Any-to-Any Multimodal Models | Peiqi Yin et.al. | 2602.02204 | null |
| 2026-02-02 | No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs | Liyan Xu et.al. | 2602.02103 | null |
| 2026-02-02 | Edge-Aligned Initialization of Kernels for Steered Mixture-of-Experts | Martin Determann et.al. | 2602.02031 | null |
| 2026-02-02 | SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning | Zhen-Hao Xie et.al. | 2602.01990 | null |
| 2026-02-02 | Mixture-of-Experts with Intermediate CTC Supervision for Accented Speech Recognition | Wonjun Lee et.al. | 2602.01967 | null |
| 2026-02-02 | SOPRAG: Multi-view Graph Experts Retrieval for Industrial Standard Operating Procedures | Liangtao Lin et.al. | 2602.01858 | null |
| 2026-02-02 | From Knowing to Doing Precisely: A General Self-Correction and Termination Framework for VLA models | Wentao Zhang et.al. | 2602.01811 | null |
| 2026-02-02 | Mutual-Guided Expert Collaboration for Cross-Subject EEG Classification | Zhi Zhang et.al. | 2602.01728 | null |
| 2026-02-02 | AdNanny: One Reasoning LLM for All Offline Ads Recommendation Tasks | Nan Hu et.al. | 2602.01563 | null |
| 2026-02-01 | A Statistical Theory of Gated Attention through the Lens of Hierarchical Mixture of Experts | Viet Nguyen et.al. | 2602.01468 | null |
| 2026-02-01 | Rethinking Multinomial Logistic Mixture of Experts with Sigmoid Gating Function | Tuan Minh Pham et.al. | 2602.01466 | null |
| 2026-02-01 | Exposing and Defending the Achilles’ Heel of Video Mixture-of-Experts | Songping Wang et.al. | 2602.01369 | null |
| 2026-02-01 | Observation of $\barΛp\to K^{+}π^{+}π^{-}π^{0}$ and $\barΛp\to K^{+}π^{+}π^{-}2π^{0}$ | BESIII Collaboration et.al. | 2602.01282 | null |
| 2026-02-01 | MiTA Attention: Efficient Fast-Weight Scaling via a Mixture of Top- $k$ Activations | Qishuai Wen et.al. | 2602.01219 | null |
| 2026-02-01 | Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse | Zizhuo Fu et.al. | 2602.01203 | null |
| 2026-01-30 | Omni-fMRI: A Universal Atlas-Free fMRI Foundation Model | Mo Wang et.al. | 2601.23090 | null |
| 2026-01-30 | UrbanMoE: A Sparse Multi-Modal Mixture-of-Experts Framework for Multi-Task Urban Region Profiling | Pingping Liu et.al. | 2601.22746 | null |
| 2026-01-30 | A Cross-Domain Graph Learning Protocol for Single-Step Molecular Geometry Refinement | Chengchun Liu et.al. | 2601.22723 | null |
| 2026-01-30 | A Step Back: Prefix Importance Ratio Stabilizes Policy Optimization | Shiye Lei et.al. | 2601.22718 | null |
| 2026-01-30 | A Unified Study of LoRA Variants: Taxonomy, Review, Codebase, and Empirical Evaluation | Haonan He et.al. | 2601.22708 | null |
| 2026-01-30 | Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments | Jinwoo Jang et.al. | 2601.22647 | null |
| 2026-01-30 | SpanNorm: Reconciling Training Stability and Performance in Deep Transformers | Chao Wang et.al. | 2601.22580 | null |
| 2026-01-30 | SHED Light on Segmentation for Dense Prediction | Seung Hyun Lee et.al. | 2601.22529 | null |
| 2026-01-30 | Continual Policy Distillation from Distributed Reinforcement Learning Teachers | Yuxuan Li et.al. | 2601.22475 | null |
| 2026-01-29 | ECO: Quantized Training without Full-Precision Master Weights | Mahdi Nikdan et.al. | 2601.22101 | null |
| 2026-01-29 | Heterogeneous Computing: The Key to Powering the Future of AI Agent Inference | Yiren Zhao et.al. | 2601.22001 | null |
| 2026-01-29 | MoE-ACT: Improving Surgical Imitation Learning Policies through Supervised Mixture-of-Experts | Lorenzo Mazza et.al. | 2601.21971 | null |
| 2026-01-29 | MoHETS: Long-term Time Series Forecasting with Mixture-of-Heterogeneous-Experts | Evandro S. Ortigossa et.al. | 2601.21866 | null |
| 2026-01-29 | OneMall: One Model, More Scenarios – End-to-End Generative Recommender Family at Kuaishou E-Commerce | Kun Zhang et.al. | 2601.21770 | null |
| 2026-01-29 | Seg-MoE: Multi-Resolution Segment-wise Mixture-of-Experts for Time Series Forecasting Transformers | Evandro S. Ortigossa et.al. | 2601.21641 | null |
| 2026-01-29 | Depth-Recurrent Attention Mixtures: Giving Latent Reasoning the Attention it Deserves | Jonas Knupp et.al. | 2601.21582 | null |
| 2026-01-29 | Multi-Modal Time Series Prediction via Mixture of Modulated Experts | Lige Zhang et.al. | 2601.21547 | null |
| 2026-01-29 | ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory | Yang Zhao et.al. | 2601.21545 | null |
| 2026-01-30 | L $^3$ : Large Lookup Layers | Albert Tseng et.al. | 2601.21461 | null |
| 2026-01-29 | ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation | Zihao Huang et.al. | 2601.21420 | null |
| 2026-01-29 | L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts | Minghao Yang et.al. | 2601.21349 | null |
| 2026-01-29 | Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies | Ce Hao et.al. | 2601.21251 | null |
| 2026-01-29 | Scaling Embeddings Outperforms Scaling Experts in Language Models | Hong Liu et.al. | 2601.21204 | null |
| 2026-01-29 | ZipMoE: Efficient On-Device MoE Serving via Lossless Compression and Cache-Affinity Scheduling | Yuchen Yang et.al. | 2601.21198 | null |
| 2026-01-29 | Precise measurements of $D^0 \to K^-\ell^+ν_\ell$ and $D^+ \to \bar K^0\ell^+ν_\ell$ decays | BESIII Collaboration et.al. | 2601.21196 | null |
| 2026-01-29 | Search for $ψ_0(4360)\rightarrow ηψ(2S)$ through the process $e^+e^- \rightarrow ηηψ(2S)$ | BESIII Collaboration et.al. | 2601.21190 | null |
| 2026-01-29 | First Experimental Constraint on the Scalar Current in the $D^{0(+)}\to \bar K\ell^+ν_{\ell}$ Transition | BESIII Collaboration et.al. | 2601.21185 | null |
| 2026-01-29 | BrainStack: Neuro-MoE with Functionally Guided Expert Routing for EEG-Based Language Decoding | Ziyi Zhao et.al. | 2601.21148 | null |
| 2026-01-29 | TRACE: Trajectory Recovery for Continuous Mechanism Evolution in Causal Representation Learning | Shicheng Fan et.al. | 2601.21135 | null |
| 2026-01-28 | ProfInfer: An eBPF-based Fine-Grained LLM Inference Profiler | Bohua Zou et.al. | 2601.20755 | null |
| 2026-01-28 | ShieldedCode: Learning Robust Representations for Virtual Machine Protected Code | Mingqiao Mo et.al. | 2601.20679 | null |
| 2026-01-28 | Unsupervised Ensemble Learning Through Deep Energy-based Models | Ariel Maymon et.al. | 2601.20556 | null |
| 2026-01-28 | OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution | Le Zhang et.al. | 2601.20380 | null |
| 2026-01-28 | OSDEnhancer: Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion | Shuoyan Wei et.al. | 2601.20308 | null |
| 2026-01-28 | MiLorE-SSL: Scaling Multilingual Capabilities in Self-Supervised Models without Forgetting | Jing Xu et.al. | 2601.20300 | null |
| 2026-01-28 | HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-BENCH | Yueyang Wang et.al. | 2601.20255 | null |
| 2026-01-28 | Hyperparameter Transfer with Mixture-of-Expert Layers | Tianze Jiang et.al. | 2601.20205 | null |
| 2026-01-28 | Meta-Cognitive Reinforcement Learning with Self-Doubt and Recovery | Zhipeng Zhang et.al. | 2601.20193 | null |
| 2026-01-27 | Revisiting Incremental Stochastic Majorization-Minimization Algorithms with Applications to Mixture of Experts | TrungKhang Tran et.al. | 2601.19811 | null |
| 2026-01-27 | Component-Level Lesioning of Language Models Reveals Clinically Aligned Aphasia Phenotypes | Yifan Wang et.al. | 2601.19723 | null |
| 2026-01-27 | LoPRo: Enhancing Low-Rank Quantization via Permuted Block-Wise Rotation | Hongyaoxing Gu et.al. | 2601.19675 | null |
| 2026-01-27 | GMS-CAVP: Improving Audio-Video Correspondence with Multi-Scale Contrastive and Generative Pretraining | Shentong Mo et.al. | 2601.19606 | null |
| 2026-01-27 | Search for the isospin-violating decays $\boldsymbol{χ_{cJ}\toΛ\barΣ^{0}+c.c.}$ and $\boldsymbol{η_{c}\toΛ\barΣ^{0}+c.c.}$ | BESIII Collaboration et.al. | 2601.19493 | null |
| 2026-01-27 | Dynamic Multi-Expert Projectors with Stabilized Routing for Multilingual Speech Recognition | Isha Pandey et.al. | 2601.19451 | null |
| 2026-01-26 | Superlinear Multi-Step Attention | Yufeng Huang et.al. | 2601.18401 | null |
| 2026-01-26 | FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning | Zhaopeng Qiu et.al. | 2601.18150 | null |
| 2026-01-26 | Beyond Static Datasets: Robust Offline Policy Optimization via Vetted Synthetic Transitions | Pedram Agand et.al. | 2601.18107 | null |
| 2026-01-26 | OneVoice: One Model, Triple Scenarios-Towards Unified Zero-shot Voice Conversion | Zhichao Wang et.al. | 2601.18094 | null |
| 2026-01-26 | LatentMoE: Toward Optimal Accuracy per FLOP and Parameter in Mixture of Experts | Venmugil Elango et.al. | 2601.18089 | null |
| 2026-01-25 | Domain-Expert-Guided Hybrid Mixture-of-Experts for Medical AI: Integrating Data-Driven Learning with Clinical Priors | Jinchen Gu et.al. | 2601.17977 | null |
| 2026-01-25 | EntWorld: A Holistic Environment and Benchmark for Verifiable Enterprise GUI Agents | Ying Mo et.al. | 2601.17722 | null |
| 2026-01-25 | $\infty$ -MoE: Generalizing Mixture of Experts to Infinite Experts | Shota Takashiro et.al. | 2601.17680 | null |
| 2026-01-25 | Health-ORSC-Bench: A Benchmark for Measuring Over-Refusal and Safety Completion in Health Context | Zhihao Zhang et.al. | 2601.17642 | null |
| 2026-01-24 | PILOT: A Perceptive Integrated Low-level Controller for Loco-manipulation over Unstructured Scenes | Xinru Cui et.al. | 2601.17440 | null |
| 2026-01-24 | Topological Protection by Local Support Symmetry and Destructive Interference | Jun-Won Rhim et.al. | 2601.17272 | null |
| 2026-01-23 | Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts | Xuan-Phi Nguyen et.al. | 2601.17111 | null |
| 2026-01-23 | First evidence for $D_s^+ \to f_1(1420) e^+ν_e$ and search for $D_s^+ \to f_1(1285) e^+ν_e$ | BESIII Collaboration et.al. | 2601.16938 | null |
| 2026-01-23 | Coarse-Grained Geometric Quantum Dynamics in the Tensor Network Representation | Mo Sha et.al. | 2601.16913 | null |
| 2026-01-23 | GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints | Andy Zhu et.al. | 2601.16905 | null |
| 2026-01-23 | Mixture-of-Models: Unifying Heterogeneous Agents via N-Way Self-Evaluating Deliberation | Tims Pecerskis et.al. | 2601.16863 | null |
| 2026-01-23 | SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents | Yuhang Wang et.al. | 2601.16746 | null |
| 2026-01-23 | LongCat-Flash-Thinking-2601 Technical Report | Meituan LongCat Team et.al. | 2601.16725 | null |
| 2026-01-23 | Search for the radiative decay $D^+_s \to γK^*(892)^+$ | BESIII Collaboration et.al. | 2601.16476 | null |
| 2026-01-22 | proto-Lightspeed: a high-speed, ultra-low read noise imager on the Magellan Clay Telescope | Christopher Layden et.al. | 2601.16268 | null |
| 2026-01-22 | Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning | Moo Jin Kim et.al. | 2601.16163 | null |
| 2026-01-22 | Universal Refusal Circuits Across LLMs: Cross-Model Transfer via Trajectory Replay and Concept-Basis Reconstruction | Tony Cristofano et.al. | 2601.16034 | null |
| 2026-01-22 | Search for the reaction channel $e^+ e^- \to ηη\,J/ψ$ and the isospin partner of the $Z_c(3900)$ at center-of-mass energies $\sqrt{s} = 4.226-4.950$ GeV | BESIII Collaboration et.al. | 2601.15882 | null |
| 2026-01-22 | LL-GaussianImage: Efficient Image Representation for Zero-shot Low-Light Enhancement with 2D Gaussian Splatting | Yuhan Chen et.al. | 2601.15772 | null |
| 2026-01-22 | Redshift-Binned Constraints on the Hubble Constant under $Λ$ CDM, CPL, and Padé Cosmography | Zhi-Yuan Mo et.al. | 2601.15765 | null |
| 2026-01-21 | On the diagonal of low bidegree hypersurfaces | Morten Lüders et.al. | 2601.15409 | null |
| 2026-01-21 | Improving MoE Compute Efficiency by Composing Weight and Data Sparsity | Maciej Kilian et.al. | 2601.15370 | null |
| 2026-01-21 | Pb4U-GNet: Resolution-Adaptive Garment Simulation via Propagation-before-Update Graph Network | Aoran Liu et.al. | 2601.15110 | null |
| 2026-01-21 | Mixture-of-Experts Models in Vision: Routing, Optimization, and Generalization | Adam Rokah et.al. | 2601.15021 | null |
| 2026-01-21 | SynPerf: A Hybrid Analytical-ML Framework for GPU Performance Prediction | Kaixuan Zhang et.al. | 2601.14910 | null |
| 2026-01-21 | Language-Coupled Reinforcement Learning for Multilingual Retrieval-Augmented Generation | Rui Qi et.al. | 2601.14896 | null |
| 2026-01-21 | UBATrack: Spatio-Temporal State Space Model for General Multi-Modal Tracking | Qihua Liang et.al. | 2601.14799 | null |
| 2026-01-21 | UniRoute: Unified Routing Mixture-of-Experts for Modality-Adaptive Remote Sensing Change Detection | Qingling Shu et.al. | 2601.14797 | null |
| 2026-01-21 | Robustness of Mixtures of Experts to Feature Noise | Dong Sun et.al. | 2601.14792 | null |
| 2026-01-21 | Online Linear Programming with Replenishment | Yuze Chen et.al. | 2601.14629 | null |
| 2026-01-20 | $π$ MPC: A Parallel-in-horizon and Construction-free NMPC Solver | Liang Wu et.al. | 2601.14414 | null |
| 2026-01-20 | Layer-adaptive Expert Pruning for Pre-Training of Mixture-of-Experts Large Language Models | YuanLab. ai et.al. | 2601.14327 | null |
| 2026-01-20 | LLMOrbit: A Circular Taxonomy of Large Language Models -From Scaling Walls to Agentic AI Systems | Badri N. Patro et.al. | 2601.14053 | null |
| 2026-01-20 | Understanding Multilingualism in Mixture-of-Experts LLMs: Routing Mechanism, Expert Specialization, and Layerwise Steering | Yuxin Chen et.al. | 2601.14050 | null |
| 2026-01-20 | DExTeR: Weakly Semi-Supervised Object Detection with Class and Instance Experts for Medical Imaging | Adrien Meyer et.al. | 2601.13954 | null |
| 2026-01-20 | The R2Pub Telescopes for Surveying: An Overview and Performance Evaluation of the System | Xuan Song et.al. | 2601.13587 | null |
| 2026-01-20 | ButterflyMoE: Sub-Linear Ternary Experts via Structured Butterfly Orbits | Aryan Karmore et.al. | 2601.13563 | null |
| 2026-01-20 | MN-TSG:Continuous Time Series Generation with Irregular Observations | Xu Zhang et.al. | 2601.13534 | null |
| 2026-01-19 | CLIP-Guided Adaptable Self-Supervised Learning for Human-Centric Visual Tasks | Mingshuang Luo et.al. | 2601.13133 | null |
| 2026-01-19 | Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning | Fengran Mo et.al. | 2601.13115 | null |
| 2026-01-19 | Polychronous Wave Computing: Timing-Native Address Selection in Spiking Networks | Natalila G. Berloff et.al. | 2601.13079 | null |
| 2026-01-19 | Synthesizing Strong-Coupling Kohn-Luttinger Superconductivity in 2D Van der Waals materials | Shi-Cong Mo et.al. | 2601.13074 | null |
| 2026-01-19 | PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning | Zhiyan Hou et.al. | 2601.13020 | null |
| 2026-01-19 | HT-GNN: Hyper-Temporal Graph Neural Network for Customer Lifetime Value Prediction in Baidu Ads | Xiaohui Zhao et.al. | 2601.13013 | null |
| 2026-01-19 | OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models | Shiyuan Li et.al. | 2601.12996 | null |
| 2026-01-19 | PhyG-MoE: A Physics-Guided Mixture-of-Experts Framework for Energy-Efficient GNSS Interference Recognition | Zhihan Zeng et.al. | 2601.12798 | null |
| 2026-01-19 | Topology-Aware Multiscale Mixture of Experts for Efficient Molecular Property Prediction | Long D. Nguyen et.al. | 2601.12637 | null |
| 2026-01-18 | A Mixture of Experts Vision Transformer for High-Fidelity Surface Code Decoding | Hoang Viet Nguyen et.al. | 2601.12483 | null |
| 2026-01-18 | Learning Diverse Skills for Behavior Models with Mixture of Experts | Wangtian Shen et.al. | 2601.12397 | null |
| 2026-01-18 | NADIR: Differential Attention Flow for Non-Autoregressive Transliteration in Indic Languages | Lakshya Tomar et.al. | 2601.12389 | null |
| 2026-01-18 | GazeFormer-MoE: Context-Aware Gaze Estimation via CLIP and MoE Transformer | Xinyuan Zhao et.al. | 2601.12316 | null |
| 2026-01-18 | Facet-Aware Multi-Head Mixture-of-Experts Model with Text-Enhanced Pre-training for Sequential Recommendation | Mingrui Liu et.al. | 2601.12301 | null |
| 2026-01-16 | Reasoning in Trees: Improving Retrieval-Augmented Generation for Multi-Hop Question Answering | Yuling Shi et.al. | 2601.11255 | null |
| 2026-01-16 | First Measurement of the Absolute Branching Fraction of $η_c \to γγ$ | BESIII Collaboration et.al. | 2601.11236 | null |
| 2026-01-16 | Self-Augmented Mixture-of-Experts for QoS Prediction | Kecheng Cai et.al. | 2601.11036 | null |
| 2026-01-16 | RobuMTL: Enhancing Multi-Task Learning Robustness Against Weather Conditions | Tasneem Shaffee et.al. | 2601.10921 | null |
| 2026-01-15 | Search for sub-GeV dark particles in $η\toπ^0+\rm{invisible}$ decay | BESIII Collaboration et.al. | 2601.10597 | null |
| 2026-01-15 | Deterministic and scalable generation of large Fock states | Mo Xiong et.al. | 2601.10559 | null |
| 2026-01-15 | Algebraic Farkas Lemma and Strong Duality for Perturbed Conic Linear Programming | P. D. Khanh et.al. | 2601.10390 | null |
| 2026-01-15 | MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts | Yuxuan Lou et.al. | 2601.10272 | null |
| 2026-01-15 | A Highly Magnetic Ultra Massive White Dwarf with a 23-minute Rotation Period | Jincheng Guo et.al. | 2601.10188 | null |
| 2026-01-15 | What Gets Activated: Uncovering Domain and Driver Experts in MoE Language Models | Guimin Hu et.al. | 2601.10159 | null |
| 2026-01-15 | MMPG: MoE-based Adaptive Multi-Perspective Graph Fusion for Protein Representation Learning | Yusong Wang et.al. | 2601.10157 | null |
| 2026-01-15 | Extremum Seeking Nonovershooting Control of Strict-Feedback Systems Under Unknown Control Direction | Kaixin Lu et.al. | 2601.09998 | null |
| 2026-01-14 | Progressive Mixture-of-Experts with autoencoder routing for continual RANS turbulence modelling | Haoyu Ji et.al. | 2601.09305 | null |
| 2026-01-14 | A Raman-Gas Spectral Compressor for High-Energy Femtosecond Laser Pulses | Zegui Wang et.al. | 2601.09234 | null |
| 2026-01-15 | A.X K1 Technical Report | Sung Jun Cheon et.al. | 2601.09200 | null |
| 2026-01-14 | WiFo-E: A Scalable Wireless Foundation Model for End-to-End FDD Precoding in Communication Networks | Weibo Wen et.al. | 2601.09186 | null |
| 2026-01-14 | Horseshoe Mixtures-of-Experts (HS-MoE) | Nick Polson et.al. | 2601.09043 | null |
| 2026-01-13 | OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG | Fengran Mo et.al. | 2601.09028 | null |
| 2026-01-12 | TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts | Yu Xu et.al. | 2601.08881 | null |
| 2026-01-13 | MixServe: An Automatic Distributed Serving System for MoE Models with Hybrid Parallelism Based on Fused Communication Algorithm | Bowen Zhou et.al. | 2601.08800 | null |
| 2026-01-13 | LWM-Spectro: A Foundation Model for Wireless Baseband Signal Spectrograms | Namhyun Kim et.al. | 2601.08780 | null |
| 2026-01-13 | M $^2$ FMoE: Multi-Resolution Multi-View Frequency Mixture-of-Experts for Extreme-Adaptive Time Series Forecasting | Yaohui Huang et.al. | 2601.08631 | null |
| 2026-01-13 | Robust CAPTCHA Using Audio Illusions in the Era of Large Language Models: from Evaluation to Advances | Ziqi Ding et.al. | 2601.08516 | null |
| 2026-01-13 | Taxon: Hierarchical Tax Code Prediction with Semantically Aligned LLM Expert Guidance | Jihang Li et.al. | 2601.08418 | null |
| 2026-01-13 | Controlled LLM Training on Spectral Sphere | Tian Xie et.al. | 2601.08393 | null |
| 2026-01-13 | Deconstructing Pre-training: Knowledge Attribution Analysis in MoE and Dense Models | Bo Wang et.al. | 2601.08383 | null |
| 2026-01-13 | Towards Principled Design of Mixture-of-Experts Language Models under Memory and Inference Constraints | Seng Pei Liew et.al. | 2601.08215 | null |
| 2026-01-12 | Towards Specialized Generalists: A Multi-Task MoE-LoRA Framework for Domain-Specific LLM Adaptation | Yuxin Yang et.al. | 2601.07935 | null |
| 2026-01-12 | An eclipsing 8.56 minute orbital period mass-transferring binary | Emma T. Chickles et.al. | 2601.07925 | null |
| 2026-01-12 | Emotional Support Evaluation Framework via Controllable and Diverse Seeker Simulator | Chaewon Heo et.al. | 2601.07698 | null |
| 2026-01-12 | Amplitude analysis and branching fraction measurement of $J/ψ\to Λ\barΣ^0η+\mathrm{c.c}$ | BESIII Collaboration et.al. | 2601.07617 | null |
| 2026-01-12 | Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models | Xin Cheng et.al. | 2601.07372 | null |
| 2026-01-11 | PALM: Progress-Aware Policy Learning via Affordance Reasoning for Long-Horizon Robotic Manipulation | Yuanzhe Liu et.al. | 2601.07060 | null |
| 2026-01-11 | Solar Open Technical Report | Sungrae Park et.al. | 2601.07022 | null |
| 2026-01-11 | Deep Learning Based Channel Extrapolation for Dual-Band Massive MIMO Systems | Qikai Xiao et.al. | 2601.06858 | null |
| 2026-01-11 | MoE-DisCo:Low Economy Cost Training Mixture-of-Experts Models | Xin Ye et.al. | 2601.06857 | null |
| 2026-01-11 | MoEScore: Mixture-of-Experts-Based Text-Audio Relevance Score Prediction for Text-to-Audio System Evaluation | Bochao Sun et.al. | 2601.06829 | null |
| 2026-01-11 | SecMoE: Communication-Efficient Secure MoE Inference via Select-Then-Compute | Bowen Shen et.al. | 2601.06790 | null |
| 2026-01-11 | AutoTour: Automatic Photo Tour Guide with Smartphones and LLMs | Huatao Xu et.al. | 2601.06781 | null |
| 2026-01-11 | MTMCS-Bench: Evaluating Contextual Safety of Multimodal Large Language Models in Multi-Turn Dialogues | Zheyuan Liu et.al. | 2601.06757 | null |
| 2026-01-10 | R-Estimation with Right-Censored Data | Glen A. Satten et.al. | 2601.06685 | null |
| 2026-01-10 | Efficient and Reliable Estimation of Named Entity Linking Quality: A Case Study on GutBrainIE | Marco Martinelli et.al. | 2601.06624 | null |
| 2026-01-10 | Hellinger Multimodal Variational Autoencoders | Huyen Khanh Vo et.al. | 2601.06572 | null |
| 2026-01-10 | Physics-guided foundation model for universal speckle removal in ultrathin multimode fiber imaging | Xianrui Zeng et.al. | 2601.06448 | null |
| 2026-01-10 | The Promise of Time-Series Foundation Models for Agricultural Forecasting: Evidence from Marketing Year Average Prices | Le Wang et.al. | 2601.06371 | null |
| 2026-01-09 | Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning | Nusrat Jahan Prottasha et.al. | 2601.06356 | null |
| 2026-01-09 | AIConfigurator: Lightning-Fast Configuration Optimization for Multi-Framework LLM Serving | Tianhao Xu et.al. | 2601.06288 | null |
| 2026-01-09 | Orchestrating Tokens and Sequences: Dynamic Hybrid Policy Optimization for RLVR | Zijun Min et.al. | 2601.05607 | null |
| 2026-01-09 | Buffered AUC maximization for scoring systems via mixed-integer optimization | Moe Shiina et.al. | 2601.05544 | null |
| 2026-01-09 | Scalable Heterogeneous Graph Learning via Heterogeneous-aware Orthogonal Prototype Experts | Wei Zhou et.al. | 2601.05537 | null |
| 2026-01-08 | MoEBlaze: Breaking the Memory Wall for Efficient MoE Training on Modern GPUs | Jiyuan Zhang et.al. | 2601.05296 | null |
| 2026-01-08 | MoE3D: A Mixture-of-Experts Module for 3D Reconstruction | Zichen Wang et.al. | 2601.05208 | null |
| 2026-01-08 | FaST: Efficient and Effective Long-Horizon Forecasting for Large-Scale Spatial-Temporal Graphs via Mixture-of-Experts | Yiji Zhao et.al. | 2601.05174 | link |
| 2026-01-08 | How to Set the Learning Rate for Large-Scale Pre-training? | Yunhua Zhou et.al. | 2601.05049 | null |
| 2026-01-08 | CuMA: Aligning LLMs with Sparse Cultural Values via Demographic-Aware Mixture of Adapters | Ao Sun et.al. | 2601.04885 | null |
| 2026-01-08 | DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation | Guanzhi Deng et.al. | 2601.04823 | null |
| 2026-01-08 | Users Mispredict Their Own Preferences for AI Writing Assistance | Vivian Lai et.al. | 2601.04461 | null |
| 2026-01-08 | Re-Rankers as Relevance Judges | Chuan Meng et.al. | 2601.04455 | null |
| 2026-01-07 | Transitive Expert Error and Routing Problems in Complex AI Systems | Forest Mars et.al. | 2601.04416 | null |
| 2026-01-06 | Scaling Trends for Multi-Hop Contextual Reasoning in Mid-Scale Language Models | Brady Steele et.al. | 2601.04254 | null |
| 2026-01-07 | When Helpers Become Hazards: A Benchmark for Analyzing Multimodal LLM-Powered Safety in Daily Life | Xinyue Lou et.al. | 2601.04043 | null |
| 2026-01-07 | A Scheduling Framework for Efficient MoE Inference on Edge GPU-NDP Systems | Qi Wu et.al. | 2601.03992 | null |
| 2026-01-07 | Spectral Manifold Regularization for Stable and Modular Routing in Deep MoE Architectures | Ibrahim Delibasoglu et.al. | 2601.03889 | null |
| 2026-01-07 | PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation | Wenlong Huang et.al. | 2601.03782 | null |
| 2026-01-07 | Variational Inference, Entropy, and Orthogonality: A Unified Theory of Mixture-of-Experts | Ye Su et.al. | 2601.03577 | null |
| 2026-01-07 | CALM: Culturally Self-Aware Language Models | Lingzhi Shen et.al. | 2601.03483 | null |
| 2026-01-06 | The Illusion of Specialization: Unveiling the Domain-Invariant “Standing Committee” in Mixture-of-Experts Models | Yan Wang et.al. | 2601.03425 | null |
| 2026-01-06 | AT2024wpp: An Extremely Luminous Fast Ultraviolet Transient Powered by Accretion onto a Black Hole | Daniel A. Perley et.al. | 2601.03337 | null |
| 2026-01-06 | ReCCur: A Recursive Corner-Case Curation Framework for Robust Vision-Language Understanding in Open and Edge Scenarios | Yihan Wei et.al. | 2601.03011 | null |
| 2026-01-08 | MoE Adapter for Large Audio Language Models: Sparsity, Disentanglement, and Gradient-Conflict-Free | Yishu Lei et.al. | 2601.02967 | null |
| 2026-01-06 | MixTTE: Multi-Level Mixture-of-Experts for Scalable and Adaptive Travel Time Estimation | Wenzhao Jiang et.al. | 2601.02943 | null |
| 2026-01-06 | MiMo-V2-Flash Technical Report | Bangjun Xiao et.al. | 2601.02780 | null |
| 2026-01-05 | Routing by Analogy: kNN-Augmented Expert Assignment for Mixture-of-Experts | Boxuan Lyu et.al. | 2601.02144 | null |
| 2026-01-05 | Cross section measurement of $e^{+}e^{-}\rightarrow π^{0}π^{0}ψ(3686)$ from $\sqrt{s}=$ 4.008 GeV to 4.951 GeV | BESIII Collaboration et.al. | 2601.02136 | null |
| 2026-01-07 | FormuLLA: A Large Language Model Approach to Generating Novel 3D Printable Formulations | Adeshola Okubena et.al. | 2601.02071 | null |
| 2026-01-05 | GCR: Geometry-Consistent Routing for Task-Agnostic Continual Anomaly Detection | Joongwon Chae et.al. | 2601.01856 | null |
| 2026-01-05 | First Observation of $D^{0(+)}\to \bar Kωe^+ν_e$ and Determination of the Branching Fraction of $\bar K_1(1270)\to \bar K ω$ | BESIII Collaboration et.al. | 2601.01817 | null |
| 2026-01-05 | Causality-Aware Temporal Projection for Video Understanding in Video-LLMs | Zhengjian Kang et.al. | 2601.01804 | null |
| 2026-01-05 | Measurements of the branching fractions of $χ_{cJ}\to 2K^+ 2K^- ω$ and $φK^+ K^- ω$ decays | BESIII Collaboration et.al. | 2601.01758 | null |
| 2026-01-05 | K-EXAONE Technical Report | Eunbi Choi et.al. | 2601.01739 | null |
| 2026-01-05 | Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications | YuanLab. ai et.al. | 2601.01718 | null |
| 2026-01-05 | Varying-Coefficient Mixture of Experts Model | Qicheng Zhao et.al. | 2601.01699 | null |
| 2026-01-06 | Measurements of the absolute branching fractions of the $Λ_{c}^{+}$ hadronic decays | BESIII Collaboration et.al. | 2601.01503 | null |
| 2026-01-04 | Multi-Subspace Multi-Modal Modeling for Diffusion Models: Estimation, Convergence and Mixture of Experts | Ruofeng Yang et.al. | 2601.01475 | null |
| 2026-01-06 | Making MoE-based LLM Inference Resilient with Tarragon | Songyu Zhang et.al. | 2601.01310 | null |
| 2026-01-03 | MambaFormer: Token-Level Guided Routing Mixture-of-Experts for Accurate and Efficient Clinical Assistance | Hamad Khan et.al. | 2601.01260 | null |
| 2026-01-02 | Reliability Under Randomness: An Empirical Analysis of Sparse and Dense Language Models Across Decoding Temperatures | Kabir Grover et.al. | 2601.00942 | null |
| 2026-01-02 | HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts | Zihan Fang et.al. | 2601.00583 | null |
| 2026-01-02 | A Language-Agnostic Hierarchical LoRA-MoE Architecture for CTC-based Multilingual ASR | Yuang Zheng et.al. | 2601.00557 | null |
| 2026-01-01 | Geometric Regularization in Mixture-of-Experts: The Disconnect Between Weights and Activations | Hyunjun Kim et.al. | 2601.00457 | null |
| 2026-01-01 | Traffic-MoE: A Sparse Foundation Model for Network Traffic Analysis | Jiajun Zhou et.al. | 2601.00357 | null |
| 2026-01-01 | Identification and Estimation under Multiple Versions of Treatment: Mixture-of-Experts Approach | Kohei Yoshikawa et.al. | 2601.00287 | null |
| 2025-12-31 | Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem | Weixun Wang et.al. | 2512.24873 | null |
| 2025-12-31 | Compute-Accuracy Pareto Frontiers for Open-Source Reasoning Large Language Models | Ákos Prucs et.al. | 2512.24776 | null |
| 2025-12-30 | Joint Selection for Large-Scale Pre-Training Data via Policy Gradient-based Mask Learning | Ziqing Fan et.al. | 2512.24265 | null |
| 2025-12-30 | Training Report of TeleChat3-MoE | Xinzhang Liu et.al. | 2512.24157 | null |
| 2025-12-30 | Skyrmion and Meron Crystals in Intermetallic Gd $3$Ru$_4$Al${12}$ : Microscopic Model Insights into Chiral Phases | Jiajun Mo et.al. | 2512.24071 | null |
| 2025-12-30 | RepetitionCurse: Measuring and Understanding Router Imbalance in Mixture-of-Experts LLMs under DoS Stress | Ruixuan Huang et.al. | 2512.23995 | null |
| 2025-12-30 | Towards a bottom-up formulation of spin kinetic theory | Zonglin Mo et.al. | 2512.23960 | null |
| 2026-01-02 | Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling | Chulun Zhou et.al. | 2512.23959 | null |
| 2025-12-30 | Learnable Query Aggregation with KV Routing for Cross-view Geo-localisation | Hualin Ye et.al. | 2512.23938 | null |
| 2025-12-29 | Observations of the Fermi bubbles and the Galactic center excess with the DArk Matter Particle Explorer | F. Alemanno et.al. | 2512.23458 | null |
| 2025-12-29 | Dynamic Subspace Composition: Efficient Adaptation via Contractive Basis Expansion | Vladimer Khasia et.al. | 2512.23448 | null |
| 2025-12-29 | Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss | Ang Lv et.al. | 2512.23447 | null |
| 2025-12-29 | Bitcoin-IPC: Scaling Bitcoin with a Network of Proof-of-Stake Subnets | Marko Vukolić et.al. | 2512.23439 | null |
| 2025-12-29 | Study of $\bar{K}^*(892)^0 η$ and $K_S^0 a_0(980)^0$ in the $D^{0} \to K_{S}^{0}π^0η$ decay | BESIII Collaboration et.al. | 2512.23389 | null |
| 2025-12-30 | YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection | Xu Lin et.al. | 2512.23273 | null |
| 2025-12-28 | Trust Region Masking for Long-Horizon LLM Reinforcement Learning | Yingru Li et.al. | 2512.23075 | null |
| 2025-12-28 | FLEX-MoE: Federated Mixture-of-Experts with Load-balanced Expert Assignment | Boyang Zhang et.al. | 2512.23070 | null |
| 2025-12-28 | Viability and Performance of a Private LLM Server for SMBs: A Benchmark Analysis of Qwen3-30B on Consumer-Grade Hardware | Alex Khalil et.al. | 2512.23029 | null |
| 2025-12-28 | Reach-Avoid Differential game with Reachability Analysis for UAVs: A decomposition approach | Minh Bui et.al. | 2512.22793 | null |
| 2025-12-28 | Text-Routed Sparse Mixture-of-Experts Model with Explanation and Temporal Alignment for Multi-Modal Sentiment Analysis | Dongning Rao et.al. | 2512.22741 | null |
| 2025-12-27 | RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure | Wei Gao et.al. | 2512.22560 | null |
| 2025-12-27 | Scalpel-SAM: A Semi-Supervised Paradigm for Adapting SAM to Infrared Small Object Detection | Zihan Liu et.al. | 2512.22483 | null |
| 2025-12-27 | Bright 4B: Scaling Hyperspherical Learning for Segmentation in 3D Brightfield Microscopy | Amil Khan et.al. | 2512.22423 | null |
| 2025-12-26 | FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication Fusion | Zhuoran Zhu et.al. | 2512.22036 | null |
| 2025-12-26 | SWE-RM: Execution-free Feedback For Software Engineering Agents | KaShun Shum et.al. | 2512.21919 | null |
| 2025-12-26 | Accelerate Speculative Decoding with Sparse Computation in Verification | Jikai Wang et.al. | 2512.21911 | null |
| 2025-12-26 | MMCTOP: A Multimodal Textualization and Mixture-of-Experts Framework for Clinical Trial Outcome Prediction | Carolina Aparício et.al. | 2512.21897 | null |
| 2025-12-26 | CrownGen: Patient-customized Crown Generation via Point Diffusion Model | Juyoung Bae et.al. | 2512.21890 | null |
| 2025-12-26 | SLIM-Brain: A Data- and Training-Efficient Foundation Model for fMRI Data Analysis | Mo Wang et.al. | 2512.21881 | null |
| 2025-12-25 | Spatiotemporal-Untrammelled Mixture of Experts for Multi-Person Motion Prediction | Zheng Yin et.al. | 2512.21707 | null |
| 2025-12-25 | Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism | Xinglin Pan et.al. | 2512.21487 | null |
| 2025-12-24 | DeepCQ: General-Purpose Deep-Surrogate Framework for Lossy Compression Quality Prediction | Khondoker Mirazul Mumenin et.al. | 2512.21433 | null |
| 2025-12-24 | SparScene: Efficient Traffic Scene Representation via Sparse Graph Learning for Large-Scale Trajectory Generation | Xiaoyu Mo et.al. | 2512.21133 | null |
| 2025-12-26 | Identification with Orthogonal Basis Functions: Convergence Speed, Asymptotic Bias, and Rate-Optimal Pole Selection | Jiayun Li et.al. | 2512.21096 | null |
| 2025-12-25 | GateBreaker: Gate-Guided Attacks on Mixture-of-Expert LLMs | Lichao Wu et.al. | 2512.21008 | null |
| 2025-12-24 | SACodec: Asymmetric Quantization with Semantic Anchoring for Low-Bitrate High-Fidelity Neural Speech Codecs | Zhongren Dong et.al. | 2512.20944 | null |
| 2025-12-24 | RevFFN: Memory-Efficient Full-Parameter Fine-Tuning of Mixture-of-Experts LLMs with Reversible Blocks | Ningyuan Liu et.al. | 2512.20920 | null |
| 2025-12-24 | NVIDIA Nemotron 3: Efficient and Open Intelligence | NVIDIA et.al. | 2512.20856 | null |
| 2025-12-23 | Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning | NVIDIA et.al. | 2512.20848 | null |
| 2025-12-23 | Defending against adversarial attacks using mixture of experts | Mohammad Meymani et.al. | 2512.20821 | null |
| 2025-12-23 | MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts | Alexandros Christoforos et.al. | 2512.20604 | null |
| 2025-12-23 | Branch Learning in MRI: More Data, More Models, More Training | Yuyang Li et.al. | 2512.20330 | null |
| 2025-12-23 | Mixture-of-Experts with Gradient Conflict-Driven Subspace Topology Pruning for Emergent Modularity | Yuxing Gan et.al. | 2512.20291 | null |
| 2025-12-23 | Degradation-Aware Metric Prompting for Hyperspectral Image Restoration | Binfeng Wang et.al. | 2512.20251 | null |
| 2025-12-23 | AMoE: Agglomerative Mixture-of-Experts Vision Foundation Model | Sofian Chaybouti et.al. | 2512.20157 | null |
| 2025-12-23 | Fun-Audio-Chat Technical Report | Qian Chen et.al. | 2512.20156 | null |
| 2025-12-23 | Bring My Cup! Personalizing Vision-Language-Action Models with Visual Attentive Prompting | Sangoh Lee et.al. | 2512.20014 | null |
| 2025-12-23 | Observation and branching fraction measurements of $χ_{cJ}\to p \bar p K^0_S K^0_S$ | BESIII Collaboration et.al. | 2512.19993 | null |
| 2025-12-22 | UCCL-EP: Portable Expert-Parallel Communication | Ziming Mao et.al. | 2512.19849 | null |
| 2025-12-21 | How Many Experts Are Enough? Towards Optimal Semantic Specialization for Mixture-of-Experts | Sumin Park et.al. | 2512.19765 | null |
| 2025-12-22 | Towards Closed-Loop Embodied Empathy Evolution: Probing LLM-Centric Lifelong Empathic Motion Generation in Unseen Scenarios | Jiawen Wang et.al. | 2512.19551 | null |
| 2025-12-22 | EGM: Efficiently Learning General Motion Tracking Policy for High Dynamic Humanoid Whole-Body Control | Chao Yang et.al. | 2512.19043 | null |
| 2025-12-21 | Tempo as the Stable Cue: Hierarchical Mixture of Tempo and Beat Experts for Music to 3D Dance Generation | Guangtao Lyu et.al. | 2512.18804 | null |
| 2025-12-21 | Rectification Reimagined: A Unified Mamba Model for Image Correction and Rectangling with Prompts | Linwei Qiu et.al. | 2512.18718 | null |
| 2025-12-21 | Remoe: Towards Efficient and Low-Cost MoE Inference in Serverless Computing | Wentao Liu et.al. | 2512.18674 | null |
| 2025-12-21 | Commercial Vehicle Braking Optimization: A Robust SIFT-Trajectory Approach | Zhe Li et.al. | 2512.18597 | null |
| 2025-12-20 | Secret mixtures of experts inside your LLM | Enric Boix-Adsera et.al. | 2512.18452 | null |
| 2025-12-20 | MoE Pathfinder: Trajectory-driven Expert Pruning | Xican Yang et.al. | 2512.18425 | null |
| 2025-12-20 | MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation | Kaixing Yang et.al. | 2512.18181 | null |
| 2025-12-20 | Cross section and parametrization of charmonium decay | Xiao-Hu Mo et.al. | 2512.18154 | null |
| 2025-12-19 | MoE-TransMov: A Transformer-based Model for Next POI Prediction in Familiar & Unfamiliar Movements | Ruichen Tan et.al. | 2512.17985 | null |
| 2025-12-19 | Interpreting the strong clustering of ultra-diffuse galaxies by halo spin bias | Qinglin Ma et.al. | 2512.17742 | null |
| 2025-12-19 | Cross sections measurement of $e^+e^-\to Ξ(1530)^0\barΞ^0 + c.c.$ and search for $ψ(3770)\toΞ(1530)^0\barΞ^0 + c.c.$ | BESIII Colaboration et.al. | 2512.17275 | null |
| 2025-12-19 | Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding | Yuqing Li et.al. | 2512.17220 | null |
| 2025-12-19 | Capturing Arbitrary Waveform without Absorption with Synthesis of Complex Frequencies | Zhaohua Tian et.al. | 2512.17156 | null |
| 2025-12-18 | Bandwidth-Efficient Adaptive Mixture-of-Experts via Low-Rank Compensation | Zhenyu Liu et.al. | 2512.17073 | null |
| 2025-12-18 | Compression is Routing: Reconstruction Error as an Intrinsic Signal for Modular Language Models | Zhongpan Tang et.al. | 2512.16963 | null |
| 2025-12-18 | LinkedOut: Linking World Knowledge Representation Out of Video LLM for Next-Generation Video Recommendation | Haichao Zhang et.al. | 2512.16891 | null |
| 2025-12-18 | The WINTER Observatory: A One-Degree InGaAs Survey Camera to study the Transient Infrared Sky | Danielle Frostig et.al. | 2512.16753 | null |
| 2025-12-18 | PoseMoE: Mixture-of-Experts Network for Monocular 3D Human Pose Estimation | Mengyuan Liu et.al. | 2512.16494 | null |
| 2025-12-18 | Efficient CPU-GPU Collaborative Inference for MoE-based LLMs on Memory-Limited Systems | En-Ming Huang et.al. | 2512.16473 | null |
| 2025-12-18 | Pretrained Battery Transformer (PBT): A battery life prediction foundation model | Ruifeng Tan et.al. | 2512.16334 | null |
| 2025-12-19 | Sigma-MoE-Tiny Technical Report | Qingguo Hu et.al. | 2512.16248 | null |
| 2025-12-18 | Open Ad-hoc Categorization with Contextualized Feature Learning | Zilin Wang et.al. | 2512.16202 | null |
| 2025-12-18 | INTELLECT-3: Technical Report | Prime Intellect Team et.al. | 2512.16144 | null |
| 2025-12-17 | Wake instability past a sphere settling in a strongly stratified flow | Chang-Fan Mo et.al. | 2512.15626 | null |
| 2025-12-17 | Measurements of the Absolute Branching Fraction of the Semileptonic Decay $\mathbf{Ξ^{-}\rightarrow Λe^- \barν_{e}}$ and the Axial Charge of the $\mathbfΞ^{-}$ | BESIII Collaboration et.al. | 2512.15273 | null |
| 2025-12-19 | VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments | Yuze Wu et.al. | 2512.15258 | null |
| 2025-12-17 | Search for the decays $X(3872)\to K_{S}^{0}K^{\pm}π^{\mp}$ and $K^*(892)\bar{K}$ at BESIII | BESIII Collaboration et.al. | 2512.15091 | null |
| 2025-12-19 | Let the Barbarians In: How AI Can Accelerate Systems Performance Research | Audrey Cheng et.al. | 2512.14806 | null |
| 2025-12-15 | SocialNav-MoE: A Mixture-of-Experts Vision Language Model for Socially Compliant Navigation with Reinforcement Fine-Tuning | Tomohito Kawabata et.al. | 2512.14757 | null |
| 2025-12-16 | Measurements of the branching fractions of $χ_{cJ}\to φφη, φφη^{\prime}$ and $φK^+K^-η$ | BESIII Collaboration et.al. | 2512.14369 | null |
| 2025-12-16 | SketchAssist: A Practical Assistant for Semantic Edits and Precise Local Redrawing | Han Zou et.al. | 2512.14140 | null |
| 2025-12-16 | SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations | Wentao Guo et.al. | 2512.14080 | null |
| 2025-12-16 | Sparsity-Controllable Dynamic Top-p MoE for Large Foundation Model Pre-training | Can Jin et.al. | 2512.13996 | null |
| 2025-12-15 | Connection between galaxy morphology and dark-matter halo structure II: predicting disk structure from dark-matter halo properties | Jinning Liang et.al. | 2512.13822 | null |
| 2025-12-13 | RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing | Yuhan Tang et.al. | 2512.13727 | null |
| 2025-12-15 | StutterFuse: Mitigating Modality Collapse in Stuttering Detection with Jaccard-Weighted Metric Learning and Gated Fusion | Guransh Singh et.al. | 2512.13632 | null |
| 2025-12-16 | Janus: Disaggregating Attention and Experts for Scalable MoE Inference | Zhexiang Zhang et.al. | 2512.13525 | null |
| 2025-12-15 | SIGMA: An AI-Empowered Training Stack on Early-Life Hardware | Lei Qu et.al. | 2512.13488 | null |
| 2025-12-15 | Automated Information Flow Selection for Multi-scenario Multi-task Recommendation | Chaohua Yang et.al. | 2512.13396 | null |
| 2025-12-15 | Sharpen the Spec, Cut the Code: A Case for Generative File System with SYSSPEC | Qingyuan Liu et.al. | 2512.13047 | null |
| 2025-12-15 | Safe Control of Multi-Agent Systems with Minimal Communication | Mo Yang et.al. | 2512.13021 | null |
| 2025-12-15 | SliceMoE: Bit-Sliced Expert Caching under Miss-Rate Constraints for Efficient MoE Inference | Yuseon Choi et.al. | 2512.12990 | null |
| 2025-12-14 | Fault-Tolerant Sandboxing for AI Coding Agents: A Transactional Approach to Safe Autonomous Execution | Boyang Yan et.al. | 2512.12806 | null |
| 2025-12-14 | Bayesian Optimization Parameter Tuning Framework for a Lyapunov Based Path Following Controller | Zhewen Zheng et.al. | 2512.12649 | null |
| 2025-12-13 | Amplitude Analysis and Branching Fraction Measurement of $D^+ \to π^+π^0π^0$ | BESIII Collaboration et.al. | 2512.12397 | null |
| 2025-12-13 | Fine-Grained Zero-Shot Learning with Attribute-Centric Representations | Zhi Chen et.al. | 2512.12219 | null |
| 2025-12-13 | ALERT Open Dataset and Input-Size-Agnostic Vision Transformer for Driver Activity Recognition using IR-UWB | Jeongjun Park et.al. | 2512.12206 | null |
| 2025-12-13 | MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models | Ahmad Chamma et.al. | 2512.12121 | null |
| 2025-12-12 | Measurement of the cosmic ray nickel energy spectrum from 10 GeV/n to 2 TeV/n with the DAMPE | F. Alemanno et.al. | 2512.11425 | null |
| 2025-12-11 | Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration | Sicheng Mo et.al. | 2512.10954 | null |
| 2025-12-11 | Unleashing Degradation-Carrying Features in Symmetric U-Net: Simpler and Stronger Baselines for All-in-One Image Restoration | Wenlong Jiao et.al. | 2512.10581 | null |
| 2025-12-11 | Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment | Han Li et.al. | 2512.10450 | null |
| 2025-12-12 | Openpi Comet: Competition Solution For 2025 BEHAVIOR Challenge | Junjie Bai et.al. | 2512.10071 | null |
| 2025-12-10 | Efficient Continual Learning in Neural Machine Translation: A Low-Rank Adaptation Approach | Salvador Carrión et.al. | 2512.09910 | null |
| 2025-12-10 | DynaIP: Dynamic Image Prompt Adapter for Scalable Zero-shot Personalized Text-to-Image Generation | Zhizhong Wang et.al. | 2512.09814 | null |
| 2025-12-10 | M3Net: A Multi-Metric Mixture of Experts Network Digital Twin with Graph Neural Networks | Blessed Guda et.al. | 2512.09797 | null |
| 2025-12-10 | First measurement of the absolute branching fractions of $Σ^+$ nonleptonic decays and test of the $ΔI = 1/2$ rule % $Σ^+ \to p π^0$ and $Σ^+ \to n π^+$ | BESIII Collaboration et.al. | 2512.09628 | null |
| 2025-12-10 | FoundIR-v2: Optimizing Pre-Training Data Mixtures for Image Restoration Foundation Model | Xiang Chen et.al. | 2512.09282 | null |
| 2025-12-10 | Efficient MoE Serving in the Memory-Bound Regime: Balance Activated Experts, Not Tokens | Yanpeng Yu et.al. | 2512.09277 | null |
| 2025-12-10 | Bug Priority Change Prediction: An Exploratory Study on Apache Software | Guangzong Cai et.al. | 2512.09216 | null |
| 2025-12-09 | Ask, Answer, and Detect: Role-Playing LLMs for Personality Detection with Question-Conditioned Mixture-of-Experts | Yifan Lyu et.al. | 2512.08814 | null |
| 2025-12-09 | What really matters for person re-identification? A Mixture-of-Experts Framework for Semantic Attribute Importance | Athena Psalta et.al. | 2512.08697 | null |
| 2025-12-09 | Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems | Mingwei Li et.al. | 2512.08411 | null |
| 2025-12-09 | FastBEV++: Fast by Algorithm, Deployable by Design | Yuanpeng Chen et.al. | 2512.08237 | null |
| 2025-12-08 | Relational Visual Similarity | Thao Nguyen et.al. | 2512.07833 | null |
| 2025-12-08 | Each Prompt Matters: Scaling Reinforcement Learning Without Wasting Rollouts on Hundred-Billion-Scale MoE | Anxiang Zeng et.al. | 2512.07710 | null |
| 2025-12-08 | LongCat-Image Technical Report | Meituan LongCat Team et.al. | 2512.07584 | null |
| 2025-12-12 | MultiMotion: Multi Subject Video Motion Transfer via Video Diffusion Transformer | Penghui Liu et.al. | 2512.07500 | null |
| 2025-12-08 | Equivariant Diffusion for Crystal Structure Prediction | Peijia Lin et.al. | 2512.07289 | null |
| 2025-12-08 | Measurement of the branching fraction of $η\to μ^+ μ^-$ and search for $η\to e^+ e^-$ | BESIII Collaboration et.al. | 2512.07144 | null |
| 2025-12-09 | TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning | Zebin Xing et.al. | 2512.07135 | null |
| 2025-12-08 | PlantBiMoE: A Bidirectional Foundation Model with SparseMoE for Plant Genomes | Kepeng Lin et.al. | 2512.07113 | null |
| 2025-12-07 | Adaptive Normalization Mamba with Multi Scale Trend Decomposition and Patch MoE Encoding | MinCheol Jeon et.al. | 2512.06929 | null |
| 2025-12-07 | Stable-MoE: Lyapunov-based Token Routing for Distributed Mixture-of-Experts Training over Edge Networks | Long Shi et.al. | 2512.06784 | null |
| 2025-12-07 | Statistic-Augmented, Decoupled MoE Routing and Aggregating in Autonomous Driving | Wei-Bin Kou et.al. | 2512.06664 | null |
| 2025-12-06 | Enhancing Medical Cross-Modal Hashing Retrieval using Dropout-Voting Mixture-of-Experts Fusion | Jaewon Ahn et.al. | 2512.06449 | null |
| 2025-12-04 | The SAM2-to-SAM3 Gap in the Segment Anything Model Family: Why Prompt-Based Expertise Fails in Concept-Driven Image Segmentation | Ranjan Sapkota et.al. | 2512.06032 | null |
| 2025-12-05 | HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies | Zhiying Du et.al. | 2512.05693 | null |
| 2025-12-05 | ProPhy: Progressive Physical Alignment for Dynamic World Simulation | Zijun Wang et.al. | 2512.05564 | null |
| 2025-12-04 | Evidence for the semileptonic decays $Λ_c^{+} \to Σ^{\pm} π^{\mp} e^+ ν_e$ | BESIII Collaboration et.al. | 2512.05178 | null |
| 2025-12-09 | EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture | Xin He et.al. | 2512.04810 | null |
| 2025-12-04 | Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild | Yigui Feng et.al. | 2512.04728 | null |
| 2025-12-04 | Study of the reaction $Ξ^{0}n\rightarrowΛΛX$ using $Ξ^{0}$ -nucleus scattering | BESIII Collaboration et.al. | 2512.04701 | null |
| 2025-12-04 | Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space | Joey Hong et.al. | 2512.04601 | null |
| 2025-12-04 | The Binary Fraction of Stars in the Dwarf Galaxy Ursa Minor via Dark Energy Spectroscopic Instrument | Tian Qiu et.al. | 2512.04477 | null |
| 2025-12-04 | Context-Aware Mixture-of-Experts Inference on CXL-Enabled GPU-NDP Systems | Zehao Fan et.al. | 2512.04476 | null |
| 2025-12-03 | Small Models Achieve Large Language Model Performance: Evaluating Reasoning-Enabled AI for Secure Child Welfare Research | Zia Qi et.al. | 2512.04261 | null |
| 2025-12-03 | Decoding Large Language Diffusion Models with Foreseeing Movement | Yichuan Mo et.al. | 2512.04135 | null |
| 2025-12-03 | Stable Signer: Hierarchical Sign Language Generative Model | Sen Fang et.al. | 2512.04048 | null |
| 2025-12-03 | OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference | Liujianfu Wang et.al. | 2512.03927 | null |
| 2025-12-04 | A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models | X. Y. Han et.al. | 2512.03915 | null |
| 2025-12-03 | Parsimonious Clustering of Covariance Matrices | Yixi Xu et.al. | 2512.03912 | null |
| 2025-12-03 | Measurement of the hyperon weak radiative decay $Ξ^0\toγΣ^0$ at BESIII | BESIII Collaboration et.al. | 2512.03877 | null |
| 2025-12-03 | Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation | Subin Kim et.al. | 2512.03534 | null |
| 2025-12-03 | CellScout: Visual Analytics for Mining Biomarkers in Cell State Discovery | Rui Sheng et.al. | 2512.03485 | null |
| 2025-12-03 | Unconventional Magneto-Optical Effects in Altermagnets | Yongpan Li et.al. | 2512.03435 | null |
| 2025-12-03 | SSLfmm: An R Package for Semi-Supervised Learning with a Mixed-Missingness Mechanism in Finite Mixture Models | Geoffrey J. McLachlan et.al. | 2512.03322 | null |
| 2025-12-02 | Intrinsic Second-Order Topological Superconductors with Tunable Majorana Zero Modes | Xiao-Jiao Wang et.al. | 2512.02775 | null |
| 2025-12-02 | Stepwise Schema-Guided Prompting Framework with Parameter Efficient Instruction Tuning for Multimedia Event Extraction | Xiang Yuan et.al. | 2512.02584 | null |
| 2025-12-02 | SkyMoE: A Vision-Language Foundation Model for Enhancing Geospatial Interpretation with Mixture of Experts | Jiaqi Liu et.al. | 2512.02517 | null |
| 2025-12-02 | A Fully First-Order Layer for Differentiable Optimization | Zihao Zhao et.al. | 2512.02494 | null |
| 2025-12-02 | Quasi-steady electron-excitonic complexes coupling in a two-dimensional semiconductor | Shangkun Mo et.al. | 2512.02490 | null |
| 2025-12-02 | Multi-Domain Enhanced Map-Free Trajectory Prediction with Selective Attention | Wenyi Xiong et.al. | 2512.02368 | null |
| 2025-12-02 | Understanding and Harnessing Sparsity in Unified Multimodal Models | Shwai He et.al. | 2512.02351 | null |
| 2025-12-02 | OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning | Boyu Zhu et.al. | 2512.02306 | null |
| 2025-12-01 | Towards Unified Video Quality Assessment | Chen Feng et.al. | 2512.02224 | null |
| 2025-12-01 | ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation | Chenyang Gu et.al. | 2512.02013 | null |
| 2025-12-01 | Multimodal Mixture-of-Experts for ISAC in Low-Altitude Wireless Networks | Kai Zhang et.al. | 2512.01750 | null |
| 2025-12-01 | GRASP: Guided Residual Adapters with Sample-wise Partitioning | Felix Nützel et.al. | 2512.01675 | null |
| 2025-12-01 | Bridging the Scale Gap: Balanced Tiny and General Object Detection in Remote Sensing Imagery | Zhicheng Zhao et.al. | 2512.01665 | null |
| 2025-12-01 | Cuffless Blood Pressure Estimation from Six Wearable Sensor Modalities in Multi-Motion-State Scenarios | Yiqiao Chen et.al. | 2512.01653 | null |
| 2025-12-01 | Integrated YOLOP Perception and Lyapunov-based Control for Autonomous Mobile Robot Navigation on Track | Mo Chen et.al. | 2512.01608 | null |
| 2025-12-01 | Personalized optimization of pediatric HD-tDCS for dose consistency and target engagement | Zeming Liu et.al. | 2512.01406 | null |
| 2025-12-02 | Stabilizing Reinforcement Learning with LLMs: Formulation and Practices | Chujie Zheng et.al. | 2512.01374 | null |
| 2025-12-01 | TagSplat: Topology-Aware Gaussian Splatting for Dynamic Mesh Modeling and Tracking | Hanzhi Guo et.al. | 2512.01329 | null |
| 2025-12-01 | Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe | Yahui Liu et.al. | 2512.01252 | null |
| 2025-11-30 | Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios | Jianxiang Zang et.al. | 2512.00920 | null |
| 2025-11-30 | Elastic Mixture of Rank-Wise Experts for Knowledge Reuse in Federated Fine-Tuning | Yebo Wu et.al. | 2512.00902 | null |
| 2025-11-30 | Upcycled and Merged MoE Reward Model for Mitigating Reward Hacking | Lingling Fu et.al. | 2512.00724 | null |
| 2025-11-29 | GCMCG: A Clustering-Aware Graph Attention and Expert Fusion Network for Multi-Paradigm, Multi-task, and Cross-Subject EEG Decoding | Yiqiao Chen et.al. | 2512.00574 | null |
| 2025-11-28 | Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model | Junshu Tang et.al. | 2511.23429 | null |
| 2025-11-28 | LFM2 Technical Report | Alexander Amini et.al. | 2511.23404 | null |
| 2025-11-28 | Chart2Code-MoLA: Efficient Multi-Modal Code Generation via Adaptive Expert Routing | Yifei Wang et.al. | 2511.23321 | null |
| 2025-11-28 | Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models | Xiang Hu et.al. | 2511.23319 | null |
| 2025-11-28 | Multi-Modal Scene Graph with Kolmogorov-Arnold Experts for Audio-Visual Question Answering | Zijian Fu et.al. | 2511.23304 | null |
| 2025-11-28 | Experts are all you need: A Composable Framework for Large Language Model Inference | Shrihari Sridharan et.al. | 2511.22955 | null |
| 2025-11-28 | EnECG: Efficient Ensemble Learning for Electrocardiogram Multi-task Foundation Model | Yuhao Xu et.al. | 2511.22935 | null |
| 2025-11-27 | Architecture Decoupling Is Not All You Need For Unified Multimodal Model | Dian Zheng et.al. | 2511.22663 | null |
| 2025-11-27 | OmniInfer: System-Wide Acceleration Techniques for Optimizing LLM Serving Throughput and Latency | Jun Wang et.al. | 2511.22481 | null |
| 2025-11-27 | Foundation Model for Intelligent Wireless Communications | Boxun Liu et.al. | 2511.22222 | null |
| 2025-11-27 | MoE3D: Mixture of Experts meets Multi-Modal 3D Understanding | Yu Li et.al. | 2511.22103 | null |
| 2025-11-27 | Convergence Dynamics of Over-Parameterized Score Matching for a Single Gaussian | Yiran Zhang et.al. | 2511.22069 | null |
| 2025-11-26 | Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models | Naifu Zhang et.al. | 2511.21663 | null |
| 2025-11-26 | Continual Error Correction on Low-Resource Devices | Kirill Paramonov et.al. | 2511.21652 | null |
| 2025-11-27 | Qwen3-VL Technical Report | Shuai Bai et.al. | 2511.21631 | null |
| 2025-11-26 | Enhanced Landmark Detection Model in Pelvic Fluoroscopy using 2D/3D Registration Loss | Chou Mo et.al. | 2511.21575 | null |
| 2025-11-26 | Scaling limits of critical FK-decorated random planar maps with $q=4$ | William Da Silva et.al. | 2511.21480 | null |
| 2025-11-26 | Study of the reactions $\bar{n} p \to 2π^{+}π^{-}$, $2π^{+}π^{-}π^{0}$, and $2π^{+}π^{-}2π^{0}$ using $J/ψ\to p π^{-}\bar{n}$ | BESIII Collaboration et.al. | 2511.21462 | null |
| 2025-11-26 | MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training | Lu Zhao et.al. | 2511.21431 | null |
| 2025-11-26 | Do Reasoning Vision-Language Models Inversely Scale in Test-Time Compute? A Distractor-centric Empirical Analysis | Jiyun Bae et.al. | 2511.21397 | null |
| 2025-11-26 | Conditional Generative Modeling of Stochastic LTI Systems: A Behavioral Approach | Jiayun Li et.al. | 2511.21219 | null |
| 2025-11-26 | MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts | Ivan Novikov et.al. | 2511.21089 | null |
| 2025-11-25 | HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation | Xiang Wang et.al. | 2511.20520 | null |
| 2025-11-25 | Soft Adaptive Policy Optimization | Chang Gao et.al. | 2511.20347 | null |
| 2025-11-25 | ADNet: A Large-Scale and Extensible Multi-Domain Benchmark for Anomaly Detection Across 380 Real-World Categories | Hai Ling et.al. | 2511.20169 | null |
| 2025-11-25 | Adaptive Knowledge Transfer for Cross-Disciplinary Cold-Start Knowledge Tracing | Yulong Deng et.al. | 2511.20009 | null |
| 2025-11-25 | SONIC: Spectral Optimization of Noise for Inpainting with Consistency | Seungyeon Baek et.al. | 2511.19985 | null |
| 2025-11-25 | Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models | Wentao Hu et.al. | 2511.19822 | null |
| 2025-11-22 | Exploiting the Experts: Unauthorized Compression in MoE-LLMs | Pinaki Prasad Guha Neogi et.al. | 2511.19480 | null |
| 2025-11-22 | Tracking and Segmenting Anything in Any Modality | Tianlu Zhang et.al. | 2511.19475 | null |
| 2025-11-24 | Life-IQA: Boosting Blind Image Quality Assessment through GCN-enhanced Layer Interaction and MoE-based Feature Decoupling | Long Tang et.al. | 2511.19024 | null |
| 2025-11-24 | OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs | Yuting Gao et.al. | 2511.19023 | null |
| 2025-11-24 | Dynamic Mixture of Experts Against Severe Distribution Shifts | Donghu Kim et.al. | 2511.18987 | null |
| 2025-11-23 | HiFi-MambaV2: Hierarchical Shared-Routed MoE for High-Fidelity MRI Reconstruction | Pengcheng Fang et.al. | 2511.18534 | null |
| 2025-11-23 | AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert | Yuting Gao et.al. | 2511.18314 | null |
| 2025-11-22 | PromptMoE: Generalizable Zero-Shot Anomaly Detection via Visually-Guided Prompt Mixtures | Yuheng Shao et.al. | 2511.18116 | null |
| 2025-11-22 | CADTrack: Learning Contextual Aggregation with Deformable Alignment for Robust RGBT Tracking | Hao Li et.al. | 2511.17967 | null |
| 2025-11-22 | Measuring the Impact of Lexical Training Data Coverage on Hallucination Detection in Large Language Models | Shuo Zhang et.al. | 2511.17946 | null |
| 2025-11-22 | FastMMoE: Accelerating Multimodal Large Language Models through Dynamic Expert Activation and Routing-Aware Token Pruning | Guoyang Xia et.al. | 2511.17885 | null |
| 2025-11-22 | Equivalence of Context and Parameter Updates in Modern Transformer Blocks | Adrian Goldwaser et.al. | 2511.17864 | null |
| 2025-11-21 | Unified Class and Domain Incremental Learning with Mixture of Experts for Indoor Localization | Akhil Singampalli et.al. | 2511.17829 | null |
| 2025-11-21 | Boosting Brain-inspired Path Integration Efficiency via Learning-based Replication of Continuous Attractor Neurodynamics | Zhangyu Ge et.al. | 2511.17687 | null |
| 2025-11-21 | Sparse Mixture-of-Experts for Multi-Channel Imaging: Are All Channel Interactions Required? | Sukwon Yun et.al. | 2511.17400 | null |
| 2025-11-21 | MCMoE: Completing Missing Modalities with Mixture of Experts for Incomplete Multimodal Action Quality Assessment | Huangbiao Xu et.al. | 2511.17397 | link |
| 2025-11-21 | Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design | Quentin Anthony et.al. | 2511.17127 | null |
| 2025-11-21 | Parametric Retrieval-Augmented Generation using Latent Routing of LoRA Adapters | Zhan Su et.al. | 2511.17044 | null |
| 2025-11-21 | VLM-Augmented Degradation Modeling for Image Restoration Under Adverse Weather Conditions | Qianyi Shao et.al. | 2511.16998 | null |
| 2025-11-21 | RadioKMoE: Knowledge-Guided Radiomap Estimation with Kolmogorov-Arnold Networks and Mixture-of-Experts | Fupei Guo et.al. | 2511.16986 | null |
| 2025-11-21 | MicroMoE: Fine-Grained Load Balancing for Mixture-of-Experts with Token Scheduling | Chenqi Zhao et.al. | 2511.16947 | null |
| 2025-11-20 | Search for the charmonium weak decay $J/ψ\to\bar{D}^0\bar{K}^{*0}+{\rm c.c.}$ | BESIII Collaboration et.al. | 2511.16083 | null |
| 2025-11-20 | Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution | Xiao He et.al. | 2511.16024 | null |
| 2025-11-19 | AquaSentinel: Next-Generation AI System Integrating Sensor Networks for Urban Underground Water Pipeline Anomaly Detection via Collaborative MoE-LLM Agent Architecture | Qiming Guo et.al. | 2511.15870 | null |
| 2025-11-19 | MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping | Yushi Huang et.al. | 2511.15690 | null |
| 2025-11-19 | Search for the lepton number violating process $Ξ^- \rightarrow Σ^+ e^- e^- +c.c.$ | BESIII Collaboration et.al. | 2511.15394 | null |
| 2025-11-19 | VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation | Tairan He et.al. | 2511.15200 | null |
| 2025-11-19 | GPU-Initiated Networking for NCCL | Khaled Hamidouche et.al. | 2511.15076 | null |
| 2025-11-19 | WiCo-PG: Wireless Channel Foundation Model for Pathloss Map Generation via Synesthesia of Machines | Mingran Sun et.al. | 2511.15030 | null |
| 2025-11-19 | WiCo-MG: Wireless Channel Foundation Model for Multipath Generation via Synesthesia of Machines | Zengrui Han et.al. | 2511.15026 | null |
| 2025-11-19 | Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference | Kexin Chu et.al. | 2511.15015 | null |
| 2025-11-18 | HMC: Learning Heterogeneous Meta-Control for Contact-Rich Loco-Manipulation | Lai Wei et.al. | 2511.14756 | null |
| 2025-11-18 | Towards Stable and Structured Time Series Generation with Perturbation-Aware Flow Matching | Jintao Zhang et.al. | 2511.14488 | null |
| 2025-11-18 | MoE-SpeQ: Speculative Quantized Decoding with Proactive Expert Prefetching and Offloading for Mixture-of-Experts | Wenfeng Wang et.al. | 2511.14102 | null |
| 2025-11-18 | FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration | Jingren Liu et.al. | 2511.14099 | null |
| 2025-11-18 | SMGeo: Cross-View Object Geo-Localization with Grid-Level Mixture-of-Experts | Fan Zhang et.al. | 2511.14093 | null |
| 2025-11-17 | MoMoE: A Mixture of Expert Agent Model for Financial Sentiment Analysis | Peng Shu et.al. | 2511.13983 | null |
| 2025-11-17 | InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE | Lipeng Wang et.al. | 2511.13488 | null |
| 2025-11-18 | YOLO Meets Mixture-of-Experts: Adaptive Expert Routing for Robust Object Detection | Ori Meiraz et.al. | 2511.13344 | null |
| 2025-11-17 | Skeletons Speak Louder than Text: A Motion-Aware Pretraining Paradigm for Video-Based Person Re-Identification | Rifen Lin et.al. | 2511.13150 | null |
| 2025-11-17 | Self-Adaptive Graph Mixture of Models | Mohit Meena et.al. | 2511.13062 | null |
| 2025-11-17 | Tokenize Once, Recommend Anywhere: Unified Item Tokenization for Multi-domain LLM-based Recommendation | Yu Hou et.al. | 2511.12922 | null |
| 2025-11-17 | Simple Lines, Big Ideas: Towards Interpretable Assessment of Human Creativity from Drawings | Zihao Lin et.al. | 2511.12880 | null |
| 2025-11-16 | Connectivity-Guided Sparsification of 2-FWL GNNs: Preserving Full Expressivity with Improved Efficiency | Rongqin Chen et.al. | 2511.12838 | null |
| 2025-11-16 | Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data | Yunxin Li et.al. | 2511.12609 | null |
| 2025-11-16 | SEMC: Structure-Enhanced Mixture-of-Experts Contrastive Learning for Ultrasound Standard Plane Recognition | Qing Cai et.al. | 2511.12559 | null |
| 2025-11-16 | MdaIF: Robust One-Stop Multi-Degradation-Aware Image Fusion with Language-Driven Semantics | Jing Li et.al. | 2511.12525 | null |
| 2025-11-16 | MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding | Zhanheng Nie et.al. | 2511.12449 | null |
| 2025-11-16 | Self-Supervised Visual Prompting for Cross-Domain Road Damage Detection | Xi Xiao et.al. | 2511.12410 | null |
| 2025-11-15 | SAC-MoE: Reinforcement Learning with Mixture-of-Experts for Control of Hybrid Dynamical Systems with Uncertainty | Leroy D’Souza et.al. | 2511.12361 | null |
| 2025-11-15 | AMR-MoEGA: Antimicrobial Resistance Prediction using Mixture of Experts and Genetic Algorithms | Anshul Bagaria et.al. | 2511.12223 | null |
| 2025-11-15 | ViTE: Virtual Graph Trajectory Expert Router for Pedestrian Trajectory Prediction | Ruochen Li et.al. | 2511.12214 | null |
| 2025-11-14 | FarSkip-Collective: Unhobbling Blocking Communication in Mixture of Experts Models | Yonatan Dukler et.al. | 2511.11505 | null |
| 2025-11-14 | Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification | Qinghao Gao et.al. | 2511.11460 | null |
| 2025-11-14 | SPOT: Single-Shot Positioning via Trainable Near-Field Rainbow Beamforming | Yeyue Cai et.al. | 2511.11391 | null |
| 2025-11-14 | Parameter-Efficient MoE LoRA for Few-Shot Multi-Style Editing | Cong Cao et.al. | 2511.11236 | null |
| 2025-11-14 | DoReMi: A Domain-Representation Mixture Framework for Generalizable 3D Understanding | Mingwei Xing et.al. | 2511.11232 | null |
| 2025-11-14 | ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable Specialization | Anzhe Cheng et.al. | 2511.10971 | null |
| 2025-11-14 | Go-UT-Bench: A Fine-Tuning Dataset for LLM-Based Unit Test Generation in Go | Yashshi Pipalani et.al. | 2511.10868 | null |
| 2025-11-13 | Generalizable Slum Detection from Satellite Imagery with Mixture-of-Experts | Sumin Lee et.al. | 2511.10300 | null |
| 2025-11-13 | RobIA: Robust Instance-aware Continual Test-time Adaptation for Deep Stereo | Jueun Ko et.al. | 2511.10107 | null |
| 2025-11-13 | BuddyMoE: Exploiting Expert Redundancy to Accelerate Memory-Constrained Mixture-of-Experts Inference | Yun Wang et.al. | 2511.10054 | null |
| 2025-11-14 | HI-TransPA: Hearing Impairments Translation Personal Assistant | Zhiming Ma et.al. | 2511.09915 | null |
| 2025-11-13 | ConSurv: Multimodal Continual Learning for Survival Analysis | Dianzhi Yu et.al. | 2511.09853 | null |
| 2025-11-11 | Let the Experts Speak: Improving Survival Prediction & Calibration via Mixture-of-Experts Heads | Todd Morrill et.al. | 2511.09567 | null |
| 2025-11-12 | SMF-VO: Direct Ego-Motion Estimation via Sparse Motion Fields | Sangheon Yang et.al. | 2511.09072 | null |
| 2025-11-12 | UniMM-V2X: MoE-Enhanced Multi-Level Fusion for End-to-End Cooperative Autonomous Driving | Ziyi Song et.al. | 2511.09013 | null |
| 2025-11-12 | Selective Sinkhorn Routing for Improved Sparse Mixture of Experts | Duc Anh Nguyen et.al. | 2511.08972 | null |
| 2025-11-12 | Bayesian Mixture of Experts For Large Language Models | Maryam Dialameh et.al. | 2511.08968 | null |
| 2025-11-12 | An Improved Dual-Attention Transformer-LSTM for Small-Sample Prediction of Modal Frequency and Actual Anchor Radius in Micro Hemispherical Resonator Design | Yuyi Yao et.al. | 2511.08900 | null |
| 2025-11-11 | OmniAID: Decoupling Semantic and Artifacts for Universal AI-Generated Image Detection in the Wild | Yuncheng Guo et.al. | 2511.08423 | null |
| 2025-11-11 | Text-based Aerial-Ground Person Retrieval | Xinyu Zhou et.al. | 2511.08369 | null |
| 2025-11-14 | Towards Non-Stationary Time Series Forecasting with Temporal Stabilization and Frequency Differencing | Junkai Lu et.al. | 2511.08229 | null |
| 2025-11-13 | National Institute on Aging PREPARE Challenge: Early Detection of Cognitive Impairment Using Speech – The SpeechCARE Solution | Maryam Zolnoori et.al. | 2511.08132 | null |
| 2025-11-13 | Information Capacity: Evaluating the Efficiency of Large Language Models via Text Compression | Cheng Yuan et.al. | 2511.08066 | null |
| 2025-11-11 | TouchWalker: Real-Time Avatar Locomotion from Touchscreen Finger Walking | Geuntae Park et.al. | 2511.07860 | null |
| 2025-11-10 | One Router to Route Them All: Homogeneous Expert Routing for Heterogeneous Graph Transformers | Georgiy Shakirov et.al. | 2511.07603 | null |
| 2025-11-12 | Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs | Zhongyang Li et.al. | 2511.07419 | null |
| 2025-11-11 | Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction | Hyeryun Park et.al. | 2511.07392 | null |
| 2025-11-10 | AgenticSciML: Collaborative Multi-Agent Systems for Emergent Discovery in Scientific Machine Learning | Qile Jiang et.al. | 2511.07262 | null |
| 2025-11-10 | Two Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture | Tianhao Fu et.al. | 2511.07110 | null |
| 2025-11-10 | CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokkien Speech Recognition | Hung-Yang Sung et.al. | 2511.06860 | null |
| 2025-11-10 | S-DAG: A Subject-Based Directed Acyclic Graph for Multi-Agent Heterogeneous Reasoning | Jiangwen Dong et.al. | 2511.06727 | null |
| 2025-11-10 | Multi-Modal Continual Learning via Cross-Modality Adapters and Representation Alignment with Knowledge Preservation | Evelyn Chee et.al. | 2511.06723 | null |
| 2025-11-09 | Route Experts by Sequence, not by Token | Tiansheng Wen et.al. | 2511.06494 | null |
| 2025-11-09 | HyMoERec: Hybrid Mixture-of-Experts for Sequential Recommendation | Kunrong Li et.al. | 2511.06388 | null |
| 2025-11-09 | DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation | Speed Zhu et.al. | 2511.06307 | null |
| 2025-11-09 | A Mixture-of-Experts Framework with Log-Logistic Components for Survival Analysis on Histopathology Images | Ardhendu Sekhar et.al. | 2511.06266 | null |
| 2025-11-08 | MoSKA: Mixture of Shared KV Attention for Efficient Long-Sequence LLM Inference | Myunghyun Rhee et.al. | 2511.06010 | null |
| 2025-11-08 | DiA-gnostic VLVAE: Disentangled Alignment-Constrained Vision Language Variational AutoEncoder for Robust Radiology Reporting with Missing Modalities | Nagur Shareef Shaik et.al. | 2511.05968 | null |
| 2025-11-08 | MoEGCL: Mixture of Ego-Graphs Contrastive Representation Learning for Multi-View Clustering | Jian Zhu et.al. | 2511.05876 | null |
| 2025-11-08 | In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading | Shuning Lin et.al. | 2511.05814 | null |
| 2025-11-07 | Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder | Zhen Xu et.al. | 2511.05745 | null |
| 2025-11-07 | BrainCSD: A Hierarchical Consistency-Driven MoE Foundation Model for Unified Connectome Synthesis and Multitask Brain Trait Prediction | Xiongri Shen et.al. | 2511.05630 | null |
| 2025-11-07 | Quantum-Uncertainty-Governed Spin Dynamics in s-d Coupled Systems | Jie Zheng et.al. | 2511.05388 | null |
| 2025-11-07 | OvA-LP: A Simple and Efficient Framework for Federated Learning on Non-IID Data | Dongjin Park et.al. | 2511.05028 | null |
| 2025-11-07 | MoE-DP: An MoE-Enhanced Diffusion Policy for Robust Long-Horizon Robotic Manipulation with Skill Decomposition and Failure Recovery | Baiye Cheng et.al. | 2511.05007 | null |
| 2025-11-06 | PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed inference | Yushu Zhao et.al. | 2511.04805 | null |
| 2025-11-06 | GNN-MoE: Context-Aware Patch Routing using GNNs for Parameter-Efficient Domain Generalization | Mahmoud Soliman et.al. | 2511.04008 | null |
| 2025-11-05 | GMoPE:A Prompt-Expert Mixture Framework for Graph Foundation Models | Zhibin Wang et.al. | 2511.03251 | null |
| 2025-11-04 | From Solo to Symphony: Orchestrating Multi-Agent Collaboration with Single-Agent Demos | Xun Wang et.al. | 2511.02762 | null |
| 2025-11-04 | Verifying LLM Inference to Prevent Model Weight Exfiltration | Roy Rinberg et.al. | 2511.02620 | null |
| 2025-11-04 | RoME: Domain-Robust Mixture-of-Experts for MILP Solution Prediction across Domains | Tianle Pu et.al. | 2511.02331 | null |
| 2025-11-04 | FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error | Fengjuan Wang et.al. | 2511.02302 | null |
| 2025-11-04 | Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining | Costin-Andrei Oncescu et.al. | 2511.02237 | null |
| 2025-11-03 | Towards Efficient Federated Learning of Networked Mixture-of-Experts for Mobile Edge Computing | Song Gao et.al. | 2511.01743 | null |
| 2025-11-03 | HMVLM: Human Motion-Vision-Lanuage Model via MoE LoRA | Lei Hu et.al. | 2511.01463 | null |
| 2025-11-04 | CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing | Yifan Zhou et.al. | 2511.01197 | null |
| 2025-11-03 | DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection | Guoxin Ma et.al. | 2511.01192 | null |
| 2025-11-01 | OmniTrack++: Omnidirectional Multi-Object Tracking by Learning Large-FoV Trajectory Feedback | Kai Luo et.al. | 2511.00510 | null |
| 2025-10-31 | LongCat-Flash-Omni Technical Report | Meituan LongCat Team et.al. | 2511.00279 | null |
| 2025-10-31 | Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals | Xiangyu Fan et.al. | 2510.27684 | null |
| 2025-10-31 | RDMA Point-to-Point Communication for LLM Systems | Nandor Licker et.al. | 2510.27656 | null |
| 2025-10-31 | MoRE: 3D Visual Geometry Reconstruction Meets Mixture-of-Experts | Jingnan Gao et.al. | 2510.27234 | null |
| 2025-10-31 | AFM-Net: Advanced Fusing Hierarchical CNN Visual Priors with Global Sequence Modeling for Remote Sensing Image Scene Classification | Yuanhao Tang et.al. | 2510.27155 | null |
| 2025-10-30 | Adaptive Data Flywheel: Applying MAPE Control Loops to AI Agent Improvement | Aaditya Shukla et.al. | 2510.27051 | null |
| 2025-10-30 | Mixture-of-Transformers Learn Faster: A Theoretical Study on Classification Problems | Hongbo Li et.al. | 2510.27004 | null |
| 2025-10-30 | MoME: Mixture of Visual Language Medical Experts for Medical Imaging Segmentation | Arghavan Rezvani et.al. | 2510.26996 | null |
| 2025-10-30 | ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference | Zixu Shen et.al. | 2510.26730 | null |
| 2025-10-30 | Low-Altitude UAV-Carried Movable Antenna for Joint Wireless Power Transfer and Covert Communications | Chuang Zhang et.al. | 2510.26628 | null |
| 2025-10-30 | Asymptotic meshes from $r$ -variational adaptation methods for static problems in one dimension | Darith Hun et.al. | 2510.26375 | null |
| 2025-10-30 | MossNet: Mixture of State-Space Experts is a Multi-Head Attention | Shikhar Tuli et.al. | 2510.26182 | null |
| 2025-10-29 | Dual Mixture-of-Experts Framework for Discrete-Time Survival Analysis | Hyeonjun Lee et.al. | 2510.26014 | null |
| 2025-10-31 | Mixture-of-Experts Operator Transformer for Large-Scale PDE Pre-Training | Hong Wang et.al. | 2510.25803 | null |
| 2025-10-29 | Revisiting scalable sequential recommendation with Multi-Embedding Approach and Mixture-of-Experts | Qiushi Pan et.al. | 2510.25285 | null |
| 2025-10-29 | MoEntwine: Unleashing the Potential of Wafer-scale Chips for Large-scale Expert Parallel Inference | Xinru Tang et.al. | 2510.25258 | null |
| 2025-10-29 | H3M-SSMoEs: Hypergraph-based Multimodal Learning with LLM Reasoning and Style-Structured Mixture of Experts | Peilin Tan et.al. | 2510.25091 | null |
| 2025-10-28 | Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation | Inclusion AI et.al. | 2510.24821 | null |
| 2025-10-28 | Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance | Yujie Wei et.al. | 2510.24711 | null |
| 2025-10-28 | Language-Conditioned Representations and Mixture-of-Experts Policy for Robust Multi-Task Robotic Manipulation | Xiucheng Zhang et.al. | 2510.24055 | null |
| 2025-10-26 | Sparsity and Superposition in Mixture of Experts | Marmik Chaudhari et.al. | 2510.23671 | null |
| 2025-10-27 | EMTSF:Extraordinary Mixture of SOTA Models for Time Series Forecasting | Musleh Alharthi et.al. | 2510.23396 | null |
| 2025-10-27 | Rethinking GSPO: The Perplexity-Entropy Equivalence | Chi Liu et.al. | 2510.23142 | null |
| 2025-10-27 | Knocking-Heads Attention | Zhanchao Zhou et.al. | 2510.23052 | null |
| 2025-10-27 | Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts | Di Zhang et.al. | 2510.23027 | null |
| 2025-10-27 | MoEMeta: Mixture-of-Experts Meta Learning for Few-Shot Relational Learning | Han Wu et.al. | 2510.23013 | null |
| 2025-10-25 | Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation | Ling-Team et.al. | 2510.22115 | null |
| 2025-10-23 | Addressing Corner Cases in Autonomous Driving: A World Model-based Approach with Mixture of Experts and LLMs | Haicheng Liao et.al. | 2510.21867 | null |
| 2025-10-24 | PINN Balls: Scaling Second-Order Methods for PINNs with Domain Decomposition and Adaptive Sampling | Andrea Bonfanti et.al. | 2510.21262 | null |
| 2025-10-24 | Adaptive Graph Mixture of Residual Experts: Unsupervised Learning on Diverse Graphs with Heterogeneous Specialization | Yunlong Chu et.al. | 2510.21207 | null |
| 2025-10-24 | Controllable-LPMoE: Adapting to Challenging Object Segmentation via Dynamic Local Priors from Mixture-of-Experts | Yanguang Sun et.al. | 2510.21114 | null |
| 2025-10-24 | MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning | Siyong Chen et.al. | 2510.21093 | null |
| 2025-10-23 | Bayesian Jammer Localization with a Hybrid CNN and Path-Loss Mixture of Experts | Mariona Jaramillo-Civill et.al. | 2510.20666 | null |
| 2025-10-23 | xTime: Extreme Event Prediction with Hierarchical Knowledge Distillation and Expert Fusion | Quan Li et.al. | 2510.20651 | null |
| 2025-10-23 | Metis-HOME: Hybrid Optimized Mixture-of-Experts for Multimodal Reasoning | Xiaohan Lan et.al. | 2510.20519 | null |
| 2025-10-23 | A Parameter-Efficient Mixture-of-Experts Framework for Cross-Modal Geo-Localization | LinFeng Li et.al. | 2510.20291 | null |
| 2025-10-23 | AsyncHZP: Hierarchical ZeRO Parallelism with Asynchronous Scheduling for Scalable LLM Training | Huawei Bai et.al. | 2510.20111 | null |
| 2025-10-22 | HybridEP: Scaling Expert Parallelism to Cross-Datacenter Scenario via Hybrid Expert/Data Transmission | Weihao Yang et.al. | 2510.19470 | null |
| 2025-10-22 | MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs | Xinfeng Xia et.al. | 2510.19366 | null |
| 2025-10-22 | Modeling Turn-Taking with Semantically Informed Gestures | Varsha Suresh et.al. | 2510.19350 | null |
| 2025-10-23 | RailS: Load Balancing for All-to-All Communication in Distributed Mixture-of-Experts Training | Heng Xu et.al. | 2510.19262 | null |
| 2025-10-22 | A Design Science Blueprint for an Orchestrated AI Assistant in Doctoral Supervision | Teo Susnjak et.al. | 2510.19227 | null |
| 2025-10-23 | MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting | In-Hwan Jin et.al. | 2510.19210 | null |
| 2025-10-25 | Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model | Ling Team et.al. | 2510.18855 | null |
| 2025-10-21 | Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework | Yujie Xing et.al. | 2510.18825 | null |
| 2025-10-21 | Noise-Conditioned Mixture-of-Experts Framework for Robust Speaker Verification | Bin Gu et.al. | 2510.18533 | null |
| 2025-10-21 | Training Diverse Graph Experts for Ensembles: A Systematic Empirical Study | Gangda Deng et.al. | 2510.18370 | null |
| 2025-10-21 | DeepSeek-OCR: Contexts Optical Compression | Haoran Wei et.al. | 2510.18234 | null |
| 2025-10-22 | L-MoE: End-to-End Training of a Lightweight Mixture of Low-Rank Adaptation Experts | Shihao Ji et.al. | 2510.17898 | null |
| 2025-10-20 | Towards 3D Objectness Learning in an Open World | Taichi Liu et.al. | 2510.17686 | null |
| 2025-10-20 | Intelligent Communication Mixture-of-Experts Boosted-Medical Image Segmentation Foundation Model | Xinwei Zhang et.al. | 2510.17684 | null |
| 2025-10-20 | Learned Inertial Odometry for Cycling Based on Mixture of Experts Algorithm | Hao Qiao et.al. | 2510.17604 | null |
| 2025-10-23 | Photon radiation induced by rescattering in strong-interacting medium with a magnetic field | Yue Zhang et.al. | 2510.17597 | null |
| 2025-10-20 | ReXMoE: Reusing Experts with Minimal Overhead in Mixture-of-Experts | Zheyue Tan et.al. | 2510.17483 | null |
| 2025-10-19 | Leave It to the Experts: Detecting Knowledge Distillation via MoE Expert Signatures | Pingzhi Li et.al. | 2510.16968 | null |
| 2025-10-19 | End-to-end Listen, Look, Speak and Act | Siyin Wang et.al. | 2510.16756 | null |
| 2025-10-18 | NeurIPT: Foundation Model for Neural Interfaces | Zitao Fang et.al. | 2510.16548 | link |
| 2025-10-18 | Input Domain Aware MoE: Decoupling Routing Decisions from Task Optimization in Mixture of Experts | Yongxiang Hua et.al. | 2510.16448 | null |
| 2025-10-18 | Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures | Minh-Khoi Nguyen-Nhat et.al. | 2510.16411 | null |
| 2025-10-17 | Expert Merging in Sparse Mixture of Experts with Nash Bargaining | Dung V. Nguyen et.al. | 2510.16138 | null |
| 2025-10-17 | Human or AI? Comparing Design Thinking Assessments by Teaching Assistants and Bots | Sumbul Khan et.al. | 2510.16069 | null |
| 2025-10-17 | Mixture of Experts Approaches in Dense Retrieval Tasks | Effrosyni Sokli et.al. | 2510.15683 | null |
| 2025-10-17 | FlexiReID: Adaptive Mixture of Expert for Multi-Modal Person Re-Identification | Zhen Sun et.al. | 2510.15595 | null |
| 2025-10-17 | Backdoor or Manipulation? Graph Mixture of Experts Can Defend Against Various Graph Adversarial Attacks | Yuyuan Feng et.al. | 2510.15333 | null |
| 2025-10-17 | MTmixAtt: Integrating Mixture-of-Experts with Multi-Mix Attention for Large-Scale Recommendation | Xianyang Qi et.al. | 2510.15286 | null |
| 2025-10-17 | Adaptive Individual Uncertainty under Out-Of-Distribution Shift with Expert-Routed Conformal Prediction | Amitesh Badkul et.al. | 2510.15233 | null |
| 2025-10-16 | Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models | Guinan Su et.al. | 2510.14853 | null |
| 2025-10-16 | MergeMoE: Efficient Compression of MoE Models via Expert Output Merging | Ruijie Miao et.al. | 2510.14436 | null |
| 2025-10-16 | Expertise need not monopolize: Action-Specialized Mixture of Experts for Vision-Language-Action Learning | Weijie Shen et.al. | 2510.14300 | null |
| 2025-10-16 | MACE: Mixture-of-Experts Accelerated Coordinate Encoding for Large-Scale Scene Localization and Rendering | Mingkai Liu et.al. | 2510.14251 | null |
| 2025-10-16 | Demonstrating Exoplanet Transit Photometry from Space with a 15-mm Aperture Optical Navigation Camera on Hayabusa2 | Koki Yumoto et.al. | 2510.14229 | null |
| 2025-10-15 | REAP the Experts: Why Pruning Prevails for One-Shot MoE compression | Mike Lasby et.al. | 2510.13999 | null |
| 2025-10-15 | Steer-MoE: Efficient Audio-Language Alignment with a Mixture-of-Experts Steering Module | Ruitao Feng et.al. | 2510.13558 | null |
| 2025-10-15 | ExpressNet-MoE: A Hybrid Deep Neural Network for Emotion Recognition | Deeptimaan Banerjee et.al. | 2510.13493 | null |
| 2025-10-15 | Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers | Xin Zhao et.al. | 2510.13462 | null |
| 2025-10-15 | Toward Efficient Inference Attacks: Shadow Model Sharing via Mixture-of-Experts | Li Bai et.al. | 2510.13451 | null |
| 2025-10-15 | UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE | Zhenyu Liu et.al. | 2510.13344 | null |
| 2025-10-15 | GatePro: Parameter-Free Expert Selection Optimization for Mixture-of-Experts Models | Chen Zheng et.al. | 2510.13079 | null |
| 2025-10-17 | Scope: Selective Cross-modal Orchestration of Visual Perception Experts | Tianyu Zhang et.al. | 2510.12974 | null |
| 2025-10-14 | Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps | Do Tien Hai et.al. | 2510.12744 | null |
| 2025-10-14 | Proof of Cloud: Data Center Execution Assurance for Confidential VMs | Filip Rezabek et.al. | 2510.12469 | null |
| 2025-10-14 | MoBiLE: Efficient Mixture-of-Experts Inference on Consumer GPU with Mixture of Big Little Experts | Yushu Zhao et.al. | 2510.12357 | null |
| 2025-10-14 | DE3S: Dual-Enhanced Soft-Sparse-Shape Learning for Medical Early Time-Series Classification | Tao Xie et.al. | 2510.12214 | null |
| 2025-10-13 | Enhancing the Quality of 3D Lunar Maps Using JAXA’s Kaguya Imagery | Yumi Iwashita et.al. | 2510.11817 | null |
| 2025-10-13 | Beyond ‘Templates’: Category-Agnostic Object Pose, Size, and Shape Estimation from a Single View | Jinyu Zhang et.al. | 2510.11687 | null |
| 2025-10-13 | Robust Ego-Exo Correspondence with Long-Term Memory | Yijun Hu et.al. | 2510.11417 | null |
| 2025-10-13 | Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers | Wenhan Ma et.al. | 2510.11370 | null |
| 2025-10-13 | What to expect from microscopic nuclear modelling for k $_{\rm eff}$ calculations ? | D. Rochman et.al. | 2510.11256 | null |
| 2025-10-13 | DND: Boosting Large Language Models with Dynamic Nested Depth | Tieyuan Chen et.al. | 2510.11001 | null |
| 2025-10-13 | MC#: Mixture Compressor for Mixture-of-Experts Large Models | Wei Huang et.al. | 2510.10962 | null |
| 2025-10-12 | Crisis-Aware Regime-Conditioned Diffusion with CVaR Allocation | Ali Atiah Alzahrani et.al. | 2510.10807 | null |
| 2025-10-12 | Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection | Shizhen Zhao et.al. | 2510.10584 | null |
| 2025-10-12 | Hierarchical LoRA MoE for Efficient CTR Model Scaling | Zhichen Zeng et.al. | 2510.10432 | null |
| 2025-10-11 | SP-MoE: Speculative Decoding and Prefetching for Accelerating MoE-based Model Inference | Liangkun Chen et.al. | 2510.10302 | null |
| 2025-10-10 | MTMD: A Multi-Task Multi-Domain Framework for Unified Ad Lightweight Ranking at Pinterest | Xiao Yang et.al. | 2510.09857 | null |
| 2025-10-10 | ARROW: An Adaptive Rollout and Routing Method for Global Weather Forecasting | Jindong Tian et.al. | 2510.09734 | null |
| 2025-10-10 | Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation | Youwei Zheng et.al. | 2510.09094 | null |
| 2025-10-09 | LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution | Xiaohui Li et.al. | 2510.08771 | null |
| 2025-10-13 | dInfer: An Efficient Inference Framework for Diffusion Language Models | Yuxin Ma et.al. | 2510.08666 | null |
| 2025-10-08 | Dynamic Mixture-of-Experts for Visual Autoregressive Model | Jort Vincenti et.al. | 2510.08629 | null |
| 2025-10-09 | FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts | Heming Zou et.al. | 2510.08396 | null |
| 2025-10-09 | Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization | Jason Bohne et.al. | 2510.08256 | null |
| 2025-10-09 | From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill | Gunjun Lee et.al. | 2510.08055 | null |
| 2025-10-09 | Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training | Ruizhe Wang et.al. | 2510.08008 | null |
| 2025-10-09 | Multilingual Knowledge Graph Completion via Efficient Multilingual Knowledge Sharing | Cunli Mao et.al. | 2510.07736 | null |
| 2025-10-09 | Mutual Learning for Hashing: Unlocking Strong Hash Functions from Weak Supervision | Xiaoxu Ma et.al. | 2510.07703 | null |
| 2025-10-09 | LiveThinking: Enabling Real-Time Efficient Reasoning for AI-Powered Livestreaming via Reinforcement Learning | Yuhan Sun et.al. | 2510.07685 | null |
| 2025-10-08 | MoGU: Mixture-of-Gaussians with Uncertainty-based Gating for Time Series Forecasting | Yoli Shavit et.al. | 2510.07459 | null |
| 2025-10-08 | Less is More: Strategic Expert Selection Outperforms Ensemble Complexity in Traffic Forecasting | Walid Guettala et.al. | 2510.07426 | null |
| 2025-10-08 | Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts | Fangshuo Liao et.al. | 2510.07205 | null |
| 2025-10-08 | A Bridge from Audio to Video: Phoneme-Viseme Alignment Allows Every Face to Speak Multiple Languages | Zibo Su et.al. | 2510.06612 | null |
| 2025-10-09 | SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation | Shuang Cheng et.al. | 2510.06303 | null |
| 2025-10-06 | Reproducibility Study of “XRec: Large Language Models for Explainable Recommendation” | Ranjan Mishra et.al. | 2510.06275 | null |
| 2025-10-10 | Barbarians at the Gate: How AI is Upending Systems Research | Audrey Cheng et.al. | 2510.06189 | null |
| 2025-10-07 | CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credits | Kangyu Wang et.al. | 2510.06133 | null |
| 2025-10-07 | Rasterized Steered Mixture of Experts for Efficient 2D Image Regression | Yi-Hsin Li et.al. | 2510.05814 | null |
| 2025-10-07 | Mixture of Neuron Experts | Runxi Cheng et.al. | 2510.05781 | null |
| 2025-10-07 | MSF-SER: Enriching Acoustic Modeling with Multi-Granularity Semantics for Speech Emotion Recognition | Haoxun Li et.al. | 2510.05749 | null |
| 2025-10-07 | Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting | Zhongkai Yu et.al. | 2510.05497 | null |
| 2025-10-06 | Stratum: System-Hardware Co-Design with Tiered Monolithic 3D-Stackable DRAM for Efficient MoE Serving | Yue Pan et.al. | 2510.05245 | null |
| 2025-10-06 | REN: Anatomically-Informed Mixture-of-Experts for Interstitial Lung Disease Diagnosis | Alec K. Peltekian et.al. | 2510.04923 | null |
| 2025-10-06 | LMM-Incentive: Large Multimodal Model-based Incentive Design for User-Generated Content in Web 3.0 | Jinbo Wen et.al. | 2510.04765 | null |
| 2025-10-06 | Multilingual Routing in Mixture-of-Experts | Lucas Bandarkar et.al. | 2510.04694 | null |
| 2025-10-06 | Improving Multimodal Brain Encoding Model with Dynamic Subject-awareness Routing | Xuanhua Yin et.al. | 2510.04670 | null |
| 2025-10-06 | Compressed Convolutional Attention: Efficient Attention in a Compressed Latent Space | Tomas Figliolia et.al. | 2510.04476 | null |
| 2025-10-05 | HoRA: Cross-Head Low-Rank Adaptation with Joint Hypernetworks | Nghiem T. Diep et.al. | 2510.04295 | null |
| 2025-10-05 | SliceMoE: Routing Embedding Slices Instead of Tokens for Fine-Grained and Balanced Transformer Scaling | Harshil Vejendla et.al. | 2510.04286 | null |
| 2025-10-05 | MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition | Umberto Cappellazzo et.al. | 2510.04136 | null |
| 2025-10-03 | Mixture of Many Zero-Compute Experts: A High-Rate Quantization Theory Perspective | Yehuda Dar et.al. | 2510.03151 | null |
| 2025-10-02 | ElasticMoE: An Efficient Auto Scaling Method for Mixture-of-Experts Models | Gursimran Singh et.al. | 2510.02613 | null |
| 2025-10-02 | UpSafe $^\circ$ C: Upcycling for Controllable Safety in Large Language Models | Yuhao Sun et.al. | 2510.02194 | null |
| 2025-10-02 | LadderMoE: Ladder-Side Mixture of Experts Adapters for Bronze Inscription Recognition | Rixin Zhou et.al. | 2510.01651 | null |
| 2025-10-01 | Dirichlet-Prior Shaping: Guiding Expert Specialization in Upcycled MoEs | Leyla Mirvakhabova et.al. | 2510.01185 | null |
| 2025-10-01 | Learning Compact Representations of LLM Abilities via Item Response Theory | Jianhao Chen et.al. | 2510.00844 | null |
| 2025-10-01 | Graph Integrated Multimodal Concept Bottleneck Model | Jiakai Lin et.al. | 2510.00701 | null |
| 2025-10-01 | FAME: Adaptive Functional Attention with Expert Routing for Function-on-Function Regression | Yifei Gao et.al. | 2510.00621 | null |
| 2025-10-01 | Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning | Minghao Yang et.al. | 2510.00570 | null |
| 2025-09-30 | FlowMoE: A Scalable Pipeline Scheduling Framework for Distributed Mixture-of-Experts Training | Yunqi Gao et.al. | 2510.00207 | null |
| 2025-09-30 | Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization | Yaoxiang Wang et.al. | 2509.26520 | null |
| 2025-09-30 | Nephrobase Cell+: Multimodal Single-Cell Foundation Model for Decoding Kidney Biology | Chenyu Li et.al. | 2509.26223 | null |
| 2025-09-30 | Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline | Haiyang Li et.al. | 2509.25991 | null |
| 2025-09-30 | UniMMAD: Unified Multi-Modal and Multi-Class Anomaly Detection via MoE-Driven Feature Decompression | Yuan Zhao et.al. | 2509.25934 | null |
| 2025-09-30 | Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel | Chuanyang Zheng et.al. | 2509.25913 | null |
| 2025-10-01 | A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI | Arvind Murari Vepa et.al. | 2509.25889 | null |
| 2025-09-30 | Collaborative Compression for Large-Scale MoE Deployment on Edge | Yixiao Chen et.al. | 2509.25689 | null |
| 2025-09-30 | LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts | Yuan Zhuang et.al. | 2509.25684 | null |
| 2025-09-30 | Guiding Mixture-of-Experts with Temporal Multimodal Interactions | Xing Han et.al. | 2509.25678 | null |
| 2025-09-29 | K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model | Bangwei Guo et.al. | 2509.25594 | null |
| 2025-09-29 | GRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE Inference | Yu Han et.al. | 2509.25041 | null |
| 2025-09-29 | LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection | Bao-Ngoc Dao et.al. | 2509.24547 | null |
| 2025-11-03 | Accelerating Mixture-of-Experts Inference by Hiding Offloading Latency with Speculative Decoding | Zhibin Wang et.al. | 2508.21706 | null |
| 2025-07-22 | Mixture-of-Expert Variational Autoencoders for Cross-Modality Embedding of Type Ia Supernova Data | Yunyi Shen et.al. | 2507.16817 | null |
| 2025-07-22 | Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training | Zixiao Huang et.al. | 2507.16274 | null |
| 2025-07-21 | Applying multimodal learning to Classify transient Detections Early (AppleCiDEr) I: Data set, methods, and infrastructure | Alexandra Junell et.al. | 2507.16088 | null |
| 2025-07-21 | Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation | Alessandro B. Melchiorre et.al. | 2507.15826 | null |
| 2025-07-21 | RankMixer: Scaling Up Ranking Models in Industrial Recommenders | Jie Zhu et.al. | 2507.15551 | null |
| 2025-07-21 | The New LLM Bottleneck: A Systems Perspective on Latent Attention and Mixture-of-Experts | Sungmin Yun et.al. | 2507.15465 | null |
| 2025-07-21 | Universal crystal material property prediction via multi-view geometric fusion in graph transformers | Liang Zhang et.al. | 2507.15303 | null |
| 2025-07-20 | CoMoCAVs: Cohesive Decision-Guided Motion Planning for Connected and Autonomous Vehicles with Multi-Policy Reinforcement Learning | Pan Hu et.al. | 2507.14903 | null |
| 2025-07-23 | GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving | Chi Wan et.al. | 2507.14456 | null |
| 2025-07-18 | SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing | Yingying Zhang et.al. | 2507.13812 | null |
| 2025-07-17 | Apple Intelligence Foundation Language Models: Tech Report 2025 | Hanzhi Zhou et.al. | 2507.13575 | null |
| 2025-07-17 | R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept Learning | Xiaohan Guo et.al. | 2507.13107 | null |
| 2025-07-16 | Astro-MoE: Mixture of Experts for Multiband Astronomical Time Series | Martina Cádiz-Leyton et.al. | 2507.12611 | null |
| 2025-07-16 | Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models | Gen Luo et.al. | 2507.12566 | null |
| 2025-07-16 | Mixture of Raytraced Experts | Andrea Perin et.al. | 2507.12419 | null |
| 2025-07-16 | CorrMoE: Mixture of Experts with De-stylization Learning for Cross-Scene and Cross-Domain Correspondence Pruning | Peiwen Xia et.al. | 2507.11834 | null |
| 2025-07-09 | The AI Shadow War: SaaS vs. Edge Computing Architectures | Rhea Pritham Marpu et.al. | 2507.11545 | null |
| 2025-07-15 | Mixture of Experts in Large Language Models | Danyang Zhang et.al. | 2507.11181 | null |
| 2025-07-15 | Atmos-Bench: 3D Atmospheric Structures for Climate Insight | Tianchi Xu et.al. | 2507.11085 | null |
| 2025-07-14 | DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models | Luolin Xiong et.al. | 2507.09955 | null |
| 2025-07-14 | ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization | Huilai Li et.al. | 2507.09945 | null |
| 2025-07-14 | Multi-residual Mixture of Experts Learning for Cooperative Control in Multi-vehicle Systems | Vindula Jayawardana et.al. | 2507.09836 | null |
| 2025-07-18 | Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts | Aakash Tripathi et.al. | 2507.09754 | null |
| 2025-07-13 | Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive | You Huang et.al. | 2507.09612 | null |
| 2025-07-12 | PPJudge: Towards Human-Aligned Assessment of Artistic Painting Process | Shiqi Jiang et.al. | 2507.09242 | null |
| 2025-07-11 | SSH-Passkeys: Leveraging Web Authentication for Passwordless SSH | Moe Kayali et.al. | 2507.09022 | null |
| 2025-07-11 | BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity | Chenyang Song et.al. | 2507.08771 | null |
| 2025-07-11 | CircFormerMoE: An End-to-End Deep Learning Framework for Circular RNA Splice Site Detection and Pairing in Plant Genomes | Tianyou Jiang et.al. | 2507.08542 | null |
| 2025-07-11 | White-Basilisk: A Hybrid Model for Code Vulnerability Detection | Ioannis Lamprou et.al. | 2507.08540 | null |
| 2025-07-21 | KAT-V1: Kwai-AutoThink Technical Report | Zizheng Zhan et.al. | 2507.08297 | null |
| 2025-07-11 | Data-Driven Dimensional Synthesis of Diverse Planar Four-bar Function Generation Mechanisms via Direct Parameterization | Woon Ryong Kim et.al. | 2507.08269 | null |
| 2025-07-10 | MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving | Lu Xu et.al. | 2507.07818 | null |
| 2025-07-10 | When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance | Peizhang Shao et.al. | 2507.07748 | null |
| 2025-07-09 | Leveraging Manifold Embeddings for Enhanced Graph Transformer Representations and Learning | Ankit Jyothish et.al. | 2507.07335 | null |
| 2025-07-08 | Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate | A. Bochkov et.al. | 2507.07129 | null |
| 2025-07-07 | Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding | Nidhi Bhatia et.al. | 2507.07120 | null |
| 2025-06-03 | Multi-level Mixture of Experts for Multimodal Entity Linking | Zhiwei Hu et.al. | 2507.07108 | null |
| 2025-07-09 | 4KAgent: Agentic Any Image to 4K Super-Resolution | Yushen Zuo et.al. | 2507.07105 | null |
| 2025-07-11 | FlexOlmo: Open Language Models for Flexible Data Use | Weijia Shi et.al. | 2507.07024 | null |
| 2025-07-09 | Deep Disentangled Representation Network for Treatment Effect Estimation | Hui Meng et.al. | 2507.06650 | null |
| 2025-07-09 | SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference | Qian Chen et.al. | 2507.06567 | null |
| 2025-07-09 | MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models | Yiwen Liu et.al. | 2507.06502 | null |
| 2025-07-08 | Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation | Szymon Płotka et.al. | 2507.06363 | null |
| 2025-07-08 | Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis | Xintong Hu et.al. | 2507.06116 | null |
| 2025-07-09 | A Survey on Prompt Tuning | Zongqian Li et.al. | 2507.06085 | null |
| 2025-07-08 | Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation Detectors | Bing Wang et.al. | 2507.05939 | null |
| 2025-07-08 | What You Have is What You Track: Adaptive and Robust Multimodal Tracking | Yuedong Tan et.al. | 2507.05899 | null |
| 2025-07-21 | Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition | Zijin Gu et.al. | 2507.05724 | null |
| 2025-07-08 | Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach | Xiaobing Chen et.al. | 2507.05685 | null |
| 2025-07-08 | City-Level Foreign Direct Investment Prediction with Tabular Learning on Judicial Data | Tianxing Wu et.al. | 2507.05651 | null |
| 2025-07-07 | QMoE: A Quantum Mixture of Experts Framework for Scalable Quantum Neural Networks | Hoang-Quan Nguyen et.al. | 2507.05190 | null |
| 2025-07-07 | NTSFormer: A Self-Teaching Graph Transformer for Multimodal Cold-Start Node Classification | Jun Hu et.al. | 2507.04870 | null |
| 2025-07-07 | UrbanMind: Towards Urban General Intelligence via Tool-Enhanced Retrieval-Augmented Generation and Multilevel Optimization | Kai Yang et.al. | 2507.04706 | null |
| 2025-07-07 | DRAE: Dynamic Retrieval-Augmented Expert Networks for Lifelong Learning and Task Adaptation in Robotics | Yayu Long et.al. | 2507.04661 | null |
| 2025-07-08 | UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification | Xixi Wan et.al. | 2507.04638 | null |
| 2025-07-07 | Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts | Yun Wang et.al. | 2507.04631 | null |
| 2025-07-06 | Nile-Chat: Egyptian Language Models for Arabic and Latin Scripts | Guokan Shang et.al. | 2507.04569 | null |
| 2025-07-22 | Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge | Linshen Liu et.al. | 2507.04123 | null |
| 2025-07-05 | From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM | Xinyi Wu et.al. | 2507.03868 | null |
| 2025-07-04 | Decoupled Relative Learning Rate Schedules | Jan Ludziejewski et.al. | 2507.03526 | null |
| 2025-07-03 | Neural Inhibition Improves Dynamic Routing and Mixture of Experts | Will Y. Zou et.al. | 2507.03221 | null |
| 2025-07-02 | Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model! | Do-hyeon Yoon et.al. | 2507.03014 | null |
| 2025-07-03 | System-performance and cost modeling of Large Language Model training and inference | Wenzhe Guo et.al. | 2507.02456 | null |
| 2025-07-03 | NLP4Neuro: Sequence-to-sequence learning for neural population decoding | Jacob J. Morra et.al. | 2507.02264 | null |
| 2025-07-02 | MoIRA: Modular Instruction Routing Architecture for Multi-Task Robotics | Dmytro Kuzmenko et.al. | 2507.01843 | null |
| 2025-07-02 | GradMetaNet: An Equivariant Architecture for Learning on Gradients | Yoav Gelberg et.al. | 2507.01649 | null |
| 2025-07-02 | Mixtures of Neural Network Experts with Application to Phytoplankton Flow Cytometry Data | Ethan Pawl et.al. | 2507.01375 | null |
| 2025-07-02 | Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model | Chaoxiang Cai et.al. | 2507.01351 | null |
| 2025-07-02 | Dynamical Multimodal Fusion with Mixture-of-Experts for Localizations | Bohao Wang et.al. | 2507.01337 | null |
| 2025-07-02 | ExPaMoE: An Expandable Parallel Mixture of Experts for Continual Test-Time Adaptation | JianChao Zhao et.al. | 2507.00502 | null |
| 2025-07-01 | MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE | Geng Zhang et.al. | 2507.00390 | null |
| 2025-06-30 | Engineering NV Centers via Hydrogen-Driven Defect Chemistry in CVD Diamonds for Quantum Applications: NVHx Dissociations into NV, Origin of 468nm Center, and Cause of Brown Coloration | Mubashir Mansoor et.al. | 2507.00300 | null |
| 2025-06-17 | LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing | Wenbing Li et.al. | 2507.00029 | null |
| 2025-06-30 | MotionGPT3: Human Motion as a Second Modality | Bingfan Zhu et.al. | 2506.24086 | null |
| 2025-06-30 | MReg: A Novel Regression Model with MoE-based Video Feature Mining for Mitral Regurgitation Diagnosis | Zhe Liu et.al. | 2506.23648 | null |
| 2025-06-30 | Towards Building Private LLMs: Exploring Multi-Node Expert Parallelism on Apple Silicon for Mixture-of-Experts Large Language Model | Mu-Chi Chen et.al. | 2506.23635 | null |
| 2025-07-01 | Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging | Lujun Li et.al. | 2506.23266 | null |
| 2025-06-29 | External Data-Enhanced Meta-Representation for Adaptive Probabilistic Load Forecasting | Haoran Li et.al. | 2506.23201 | null |
| 2025-06-29 | Hierarchical Corpus-View-Category Refinement for Carotid Plaque Risk Grading in Ultrasound | Zhiyuan Zhu et.al. | 2506.23108 | null |
| 2025-07-01 | Hecto: Modular Sparse Experts for Adaptive and Interpretable Reasoning | Sanskar Pandey et.al. | 2506.22919 | null |
| 2025-06-27 | QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization | Danush Khanna et.al. | 2506.22396 | null |
| 2025-06-27 | Towards Distributed Neural Architectures | Aditya Cowsik et.al. | 2506.22389 | null |
| 2025-06-27 | MPipeMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism | Zheng Zhang et.al. | 2506.22175 | null |
| 2025-07-09 | DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE | Hang Shao et.al. | 2506.21864 | null |
| 2025-06-21 | AdaptGOT: A Pre-trained Model for Adaptive Contextual POI Representation Learning | Xiaobin Ren et.al. | 2506.21612 | null |
| 2025-06-26 | Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts | Jiajie Yang et.al. | 2506.21328 | null |
| 2025-06-26 | Learning to Skip the Middle Layers of Transformers | Tim Lawson et.al. | 2506.21103 | null |
| 2025-06-26 | Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning | Haodong Lu et.al. | 2506.21035 | null |
| 2025-06-26 | EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning | Xiao Zhang et.al. | 2506.20986 | null |
| 2025-06-30 | The Singapore Consensus on Global AI Safety Research Priorities | Yoshua Bengio et.al. | 2506.20702 | null |
| 2025-06-17 | Utility-Driven Speculative Decoding for Mixture-of-Experts | Anish Saxena et.al. | 2506.20675 | null |
| 2025-06-25 | Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration | Jiaxing Huang et.al. | 2506.20282 | null |
| 2025-06-24 | Integrating Pair Programming as a Work Practice | Nina Haugland Andersen et.al. | 2506.19511 | null |
| 2025-07-05 | The H $α$ line as a probe of chromospheric magnetic fields | Harsh Mathur et.al. | 2506.19510 | null |
| 2025-06-23 | Multimodal Anomaly Detection with a Mixture-of-Experts | Christoph Willibald et.al. | 2506.19077 | null |
| 2025-06-23 | Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models | Zihan Wang et.al. | 2506.18945 | null |
| 2025-06-23 | Shift Happens: Mixture of Experts based Continual Adaptation in Federated Learning | Rahul Atul Bhope et.al. | 2506.18789 | null |
| 2025-06-23 | An Audio-centric Multi-task Learning Framework for Streaming Ads Targeting on Spotify | Shivam Verma et.al. | 2506.18735 | null |
| 2025-06-23 | Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks | Xiaodong Wu et.al. | 2506.18543 | null |
| 2025-06-23 | SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation | Zichong Li et.al. | 2506.18349 | null |
| 2025-06-23 | Sharpening the Spear: Adaptive Expert-Guided Adversarial Attack Against DRL-based Autonomous Driving Policies | Junchao Fan et.al. | 2506.18304 | null |
| 2025-06-22 | Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection | Zheng Zhan et.al. | 2506.18145 | null |
| 2025-06-21 | Incorporating Rather Than Eliminating: Achieving Fairness for Skin Disease Diagnosis Through Group-Specific Expert | Gelei Xu et.al. | 2506.17787 | null |
| 2025-06-21 | Physics-informed mixture of experts network for interpretable battery degradation trajectory computation amid second-life complexities | Xinghao Huang et.al. | 2506.17755 | null |
| 2025-06-21 | PDC-Net: Pattern Divide-and-Conquer Network for Pelvic Radiation Injury Segmentation | Xinyu Xiong et.al. | 2506.17712 | null |
| 2025-06-20 | SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification | Zhenglin Lai et.al. | 2506.17368 | null |
| 2025-07-14 | FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE | Khiem Le et.al. | 2506.16600 | null |
| 2025-06-19 | Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models | Daniel Fidel Harvey et.al. | 2506.16419 | null |
| 2025-06-19 | DCFNet: Doppler Correction Filter Network for Integrated Sensing and Communication in Multi-User MIMO-OFDM Systems | Hyeonho Noh et.al. | 2506.16191 | null |
| 2025-06-17 | Scaling Intelligence: Designing Data Centers for Next-Gen Language Models | Jesmin Jahan Tithi et.al. | 2506.15006 | null |
| 2025-06-17 | NeuroMoE: A Transformer-Based Mixture-of-Experts Framework for Multi-Modal Neurological Disorder Classification | Wajih Hassan Raza et.al. | 2506.14970 | null |
| 2025-06-17 | Narrowing the Gap between TEEs Threat Model and Deployment Strategies | Filip Rezabek et.al. | 2506.14964 | null |
| 2025-05-31 | Assembly of Experts: Linear-time construction of the Chimera LLM variants with emergent and adaptable behaviors | Henrik Klagges et.al. | 2506.14794 | null |
| 2025-06-19 | Integrating Dynamical Systems Learning with Foundational Models: A Meta-Evolutionary AI Framework for Clinical Trials | Joseph Geraci et.al. | 2506.14782 | null |
| 2025-06-17 | GMT: General Motion Tracking for Humanoid Whole-Body Control | Zixuan Chen et.al. | 2506.14770 | null |
| 2025-06-17 | Exploring Speaker Diarization with Mixture of Experts | Gaobin Yang et.al. | 2506.14750 | null |
| 2025-06-18 | Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs | Ling Team et.al. | 2506.14731 | null |
| 2025-09-23 | GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors | Hengyuan Zhang et.al. | 2506.14646 | null |
| 2025-06-17 | Single-Example Learning in a Mixture of GPDMs with Latent Geometries | Jesse St. Amand et.al. | 2506.14563 | null |
| 2025-06-30 | MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation | Shen Yuan et.al. | 2506.14436 | link |
| 2025-06-17 | MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models | Hongyu Wang et.al. | 2506.14435 | null |
| 2025-06-17 | Less is More: Undertraining Experts Improves Model Upcycling | Stefan Horoi et.al. | 2506.14126 | null |
| 2025-06-16 | Load Balancing Mixture of Experts with Similarity Preserving Routers | Nabil Omi et.al. | 2506.14038 | null |
| 2025-06-16 | GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics | Qianzhong Chen et.al. | 2506.14009 | null |
| 2025-06-16 | MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention | MiniMax et.al. | 2506.13585 | link |
| 2025-06-16 | Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization | Guanghui Song et.al. | 2506.13541 | null |
| 2025-07-04 | EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization | Zhongqian Fu et.al. | 2506.13329 | link |
| 2025-06-16 | Breaking Thought Patterns: A Multi-Dimensional Reasoning Framework for LLMs | Xintong Tang et.al. | 2506.13192 | null |
| 2025-06-19 | Serving Large Language Models on Huawei CloudMatrix384 | Pengfei Zuo et.al. | 2506.12708 | null |
| 2025-06-14 | Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts | Shengzhuang Chen et.al. | 2506.12597 | null |
| 2025-06-14 | Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control | Rongpeng Li et.al. | 2506.12453 | null |
| 2025-06-17 | HarMoEny: Efficient Multi-GPU Inference of MoE Models | Zachary Doucet et.al. | 2506.12417 | null |
| 2025-06-14 | Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model | Chong Li et.al. | 2506.12388 | null |
| 2025-06-13 | Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources? | Houyi Li et.al. | 2506.12119 | null |
| 2025-06-13 | Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution | Zhangkai Ni et.al. | 2506.11823 | link |
| 2025-05-21 | MoTE: Mixture of Task-specific Experts for Pre-Trained ModelBased Class-incremental Learning | Linjie Li et.al. | 2506.11038 | null |
| 2025-04-23 | Test code generation at Ericsson using Program Analysis Augmented Fine Tuned LLMs | Sai Krishna et.al. | 2506.11006 | null |
| 2025-06-12 | Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts | Zaijing Li et.al. | 2506.10357 | null |
| 2025-06-12 | Technical Report with Proofs for A Full Picture in Conformance Checking: Efficiently Summarizing All Optimal Alignments | Philipp Bär et.al. | 2506.10345 | null |
| 2025-06-13 | A Survey of Generative Categories and Techniques in Multimodal Large Language Models | Longzhen Han et.al. | 2506.10016 | null |
| 2025-06-11 | GigaChat Family: Efficient Russian Language Modeling Through Mixture of Experts Architecture | GigaChat team et.al. | 2506.09440 | null |
| 2025-06-11 | DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts | Yuchen Feng et.al. | 2506.09351 | null |
| 2025-06-11 | Ming-Omni: A Unified Multimodal Model for Perception and Generation | Inclusion AI et.al. | 2506.09344 | link |
| 2025-06-10 | CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks | Yixuan Li et.al. | 2506.08931 | null |
| 2025-06-10 | CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA | Jiale Dong et.al. | 2506.08496 | link |
| 2025-06-11 | MedMoE: Modality-Specialized Mixture of Experts for Medical Vision-Language Understanding | Shivang Chopra et.al. | 2506.08356 | null |
| 2025-06-09 | Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting | Timothée Hornek Amir Sartipi et.al. | 2506.08113 | null |
| 2025-06-11 | STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation | Yiming Wang et.al. | 2506.08054 | link |
| 2025-06-09 | A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling | Jacob Helwig et.al. | 2506.07969 | link |
| 2025-06-09 | New Insights into the T Tauri Binary Separation Distribution | Caleb Eastlund et.al. | 2506.07938 | null |
| 2025-06-09 | M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration | Yongzhen Wang et.al. | 2506.07814 | null |
| 2025-07-23 | MIRA: Medical Time Series Foundation Model for Real-World Health Data | Hao Li et.al. | 2506.07584 | null |
| 2025-06-11 | MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization | Ken Yaggel et.al. | 2506.07563 | link |
| 2025-06-09 | MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts | Wei Tao et.al. | 2506.07533 | null |
| 2025-06-09 | Graph-of-Causal Evolution: Challenging Chain-of-Model for Reasoning | Libo Wang et.al. | 2506.07501 | null |
| 2025-06-09 | MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing | Haiyue Ma et.al. | 2506.07366 | null |
| 2025-06-08 | UNO: Unified Self-Supervised Monocular Odometry for Platform-Agnostic Deployment | Wentao Zhao et.al. | 2506.07013 | null |
| 2025-06-07 | High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations | Ziwei Li et.al. | 2506.06858 | null |
| 2025-06-07 | Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning | Yuan Yuan et.al. | 2506.06694 | null |
| 2025-06-25 | SMAR: Soft Modality-Aware Routing Strategy for MoE-based Multimodal Large Language Models Preserving Language Capabilities | Guoyang Xia et.al. | 2506.06406 | null |
| 2025-05-27 | MoE-Gyro: Self-Supervised Over-Range Reconstruction and Denoising for MEMS Gyroscopes | Feiyang Pan et.al. | 2506.06318 | null |
| 2025-06-06 | Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization | Jonathan Yang et.al. | 2506.06196 | null |
| 2025-06-06 | MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models | Jie Cao et.al. | 2506.05928 | null |
| 2025-06-06 | dots.llm1 Technical Report | Bi Huo et.al. | 2506.05767 | null |
| 2025-06-05 | Mixture-of-Experts Meets In-Context Reinforcement Learning | Wenhao Wu et.al. | 2506.05426 | null |
| 2025-06-20 | Kinetics: Rethinking Test-Time Scaling Laws | Ranajoy Sadhukhan et.al. | 2506.05333 | link |
| 2025-06-05 | Lifelong Evolution: Collaborative Learning between Large and Small Language Models for Continuous Emergent Fake News Detection | Ziyi Zhou et.al. | 2506.04739 | null |
| 2025-06-09 | FlashDMoE: Fast Distributed MoE in a Single Kernel | Osayamen Jonathan Aimuyo et.al. | 2506.04667 | link |
| 2025-06-04 | Out-of-Distribution Graph Models Merging | Yidi Wang et.al. | 2506.03674 | null |
| 2025-06-04 | Resolving Task Objective Conflicts in Unified Multimodal Understanding and Generation via Task-Aware Mixture-of-Experts | Jiaxing Zhang et.al. | 2506.03591 | null |
| 2025-06-04 | PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs | Ze Yu Zhang et.al. | 2506.02965 | null |
| 2025-06-03 | Scaling Fine-Grained MoE Beyond 50B Parameters: Empirical Evaluation and Practical Insights | Jakub Krajewski et.al. | 2506.02890 | null |
| 2025-06-03 | Brain-Like Processing Pathways Form in Models With Heterogeneous Experts | Jack Cook et.al. | 2506.02813 | null |
| 2025-06-04 | MemoryOut: Learning Principal Features via Multimodal Sparse Filtering Network for Semi-supervised Video Anomaly Detection | Juntong Li et.al. | 2506.02535 | null |
| 2025-06-03 | MidPO: Dual Preference Optimization for Safety and Helpfulness in Large Language Models via a Mixture of Experts Framework | Yupeng Qi et.al. | 2506.02460 | null |
| 2025-05-31 | Enhancing Multimodal Continual Instruction Tuning with BranchLoRA | Duzhen Zhang et.al. | 2506.02041 | null |
| 2025-06-02 | SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model | Zhao Yang et.al. | 2506.01833 | link |
| 2025-06-02 | Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning | Ryotaro Kawata et.al. | 2506.01656 | null |
| 2025-06-02 | DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models | Jiancheng Ye et.al. | 2506.01257 | null |
| 2025-06-01 | Unlocking Personalized Knowledge in Federated Large Language Model: The Power of Mixture of Experts | Fan Liu et.al. | 2506.00965 | null |
| 2025-05-31 | FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts | Xinyi Wang et.al. | 2506.00495 | null |
| 2025-05-30 | Mixture-of-Experts for Personalized and Semantic-Aware Next Location Prediction | Shuai Liu et.al. | 2505.24597 | null |
| 2025-06-11 | Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis | Junzhuo Li et.al. | 2505.24593 | null |
| 2025-05-30 | Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer | Yilun Kong et.al. | 2505.24378 | link |
| 2025-05-30 | GradPower: Powering Gradients for Faster Language Model Pre-Training | Mingze Wang et.al. | 2505.24275 | null |
| 2025-05-30 | On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks | Mingze Wang et.al. | 2505.24205 | null |
| 2025-06-02 | Point-MoE: Towards Cross-Domain Generalization in 3D Semantic Segmentation via Mixture-of-Experts | Xuweiyi Chen et.al. | 2505.23926 | null |
| 2025-06-09 | Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert | Zhaokun Wang et.al. | 2505.23868 | null |
| 2025-05-29 | Revisiting Uncertainty Estimation and Calibration of Large Language Models | Linwei Tao et.al. | 2505.23854 | null |
| 2025-05-28 | EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models | Linglin Jing et.al. | 2505.23830 | null |
| 2025-06-03 | LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions | Hadi Askari et.al. | 2505.23811 | null |
| 2025-05-29 | From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents | Tobias Lindenbauer et.al. | 2505.23422 | link |
| 2025-05-29 | Context-Aware Semantic Communication for the Wireless Networks | Guangyuan Liu et.al. | 2505.23249 | null |
| 2025-05-29 | Two Is Better Than One: Rotations Scale LoRAs | Hongcan Guo et.al. | 2505.23184 | null |
| 2025-05-28 | HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer | Qi Cai et.al. | 2505.22705 | link |
| 2025-05-28 | Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts | Xue Zhang et.al. | 2505.22582 | null |
| 2025-05-28 | A Human-Centric Approach to Explainable AI for Personalized Education | Vinitra Swamy et.al. | 2505.22541 | link |
| 2025-05-28 | Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion | Kewen Chen et.al. | 2505.22360 | null |
| 2025-05-28 | Advancing Expert Specialization for Better MoE | Hongcan Guo et.al. | 2505.22323 | null |
| 2025-05-28 | ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation | Jiawen Yu et.al. | 2505.22159 | null |
| 2025-05-28 | On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition | Shujie HU et.al. | 2505.22072 | null |
| 2025-05-28 | AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation | Yan Rong et.al. | 2505.22053 | null |
| 2025-05-29 | ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge | Zhongyi Zhou et.al. | 2505.21906 | null |
| 2025-05-27 | MedBridge: Bridging Foundation Vision-Language Models to Medical Image Diagnosis | Yitong Li et.al. | 2505.21698 | null |
| 2025-05-23 | EvidenceMoE: A Physics-Guided Mixture-of-Experts with Evidential Critics for Advancing Fluorescence Light Detection and Ranging in Scattering Media | Ismail Erbas et.al. | 2505.21532 | null |
| 2025-05-29 | Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity | Yehui Tang et.al. | 2505.21411 | null |
| 2025-05-27 | Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM’s Instruction-Following Capabilities | Junyan Zhang et.al. | 2505.21191 | null |
| 2025-05-27 | Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts | Yue Zhang et.al. | 2505.21079 | null |
| 2025-05-27 | Multi-objective Large Language Model Alignment with Hierarchical Experts | Zhuo Li et.al. | 2505.20925 | null |
| 2025-05-27 | FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models | Hao Kang et.al. | 2505.20225 | null |
| 2025-06-01 | NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID | Shihao Li et.al. | 2505.20001 | null |
| 2025-05-26 | Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments | Junming Liu et.al. | 2505.19699 | null |
| 2025-06-13 | MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE | Zongle Huang et.al. | 2505.19645 | null |
| 2025-05-26 | Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate | Liangwei Nathan Zheng et.al. | 2505.19525 | link |
| 2025-05-26 | WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference | Sihan Chen et.al. | 2505.19427 | link |
| 2025-05-25 | RankLLM: A Python Package for Reranking with LLMs | Sahel Sharifymoghaddam et.al. | 2505.19284 | null |
| 2025-05-25 | I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts | Jiayi Xin et.al. | 2505.19190 | link |
| 2025-05-24 | TrajMoE: Spatially-Aware Mixture of Experts for Unified Human Mobility Modeling | Chonghua Han et.al. | 2505.18670 | null |
| 2025-05-24 | ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation | Jian Liang et.al. | 2505.18640 | link |
| 2025-07-02 | Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter | Weizhi Zhong et.al. | 2505.18612 | null |
| 2025-05-24 | Guiding the Experts: Semantic Priors for Efficient and Focused MoE Routing | Chengxi Min et.al. | 2505.18586 | link |
| 2025-05-24 | Focus on What Matters: Enhancing Medical Vision-Language Models with Automatic Attention Alignment Tuning | Aofei Chang et.al. | 2505.18503 | null |
| 2025-05-24 | On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts | Fanqi Yan et.al. | 2505.18455 | null |
| 2025-05-24 | $μ$ -MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts | Toshiaki Koike-Akino et.al. | 2505.18451 | null |
| 2025-05-23 | Betelgeuse’s Buddy: X-Ray Constraints on the Nature of $α$ Ori B | Anna J. G. O’Grady et.al. | 2505.18376 | null |
| 2025-05-23 | Betelgeuse, Betelgeuse, Betelgeuse, Betel-buddy? Constraints on the dynamical companion to $α$ Orionis from HST | Jared A. Goldberg et.al. | 2505.18375 | null |
| 2025-05-13 | Constrained Edge AI Deployment: Fine-Tuning vs Distillation for LLM Compression | Jacob Sander et.al. | 2505.18166 | null |
| 2025-05-23 | Enhancing CTR Prediction with De-correlated Expert Networks | Jiancheng Wang et.al. | 2505.17925 | null |
| 2025-05-23 | PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval | Zehua Pei et.al. | 2505.17639 | null |
| 2025-05-23 | CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning | Jinyuan Feng et.al. | 2505.17553 | null |
| 2025-05-31 | MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation | Kaixing Yang et.al. | 2505.17543 | null |
| 2025-07-04 | JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model | Qihao Duan et.al. | 2505.17257 | null |
| 2025-05-31 | TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling | Weizhe Lin et.al. | 2505.17155 | null |
| 2025-05-22 | DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving | Zhenjie Yang et.al. | 2505.16278 | null |
| 2025-05-22 | DualComp: End-to-End Learning of a Unified Dual-Modality Lossless Compressor | Yan Zhao et.al. | 2505.16256 | null |
| 2025-05-21 | Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models | Jingcong Liang et.al. | 2505.16056 | link |
| 2025-05-26 | MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding | Yuxiang Wei et.al. | 2505.15946 | null |
| 2025-05-21 | Who “Controls” Where Work Shall be Done? State-of-Practice in Post-Pandemic Remote Work Regulation | Darja Smite et.al. | 2505.15743 | null |
| 2025-05-21 | CoLA: Collaborative Low-Rank Adaptation | Yiyun Zhou et.al. | 2505.15471 | link |
| 2025-07-04 | Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought | Tencent Hunyuan Team et.al. | 2505.15431 | null |
| 2025-05-21 | Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks | Uranik Berisha et.al. | 2505.15414 | null |
| 2025-05-21 | Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites | Xintong Wang et.al. | 2505.15297 | null |
| 2025-05-21 | Time Tracker: Mixture-of-Experts-Enhanced Foundation Time Series Forecasting Model with Decoupled Training Pipelines | Xiaohou Shi et.al. | 2505.15151 | null |
| 2025-05-20 | Multimodal Cultural Safety: Evaluation Frameworks and Alignment Strategies | Haoyi Qiu et.al. | 2505.14972 | link |
| 2025-05-30 | TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis | Yu Zhang et.al. | 2505.14910 | link |
| 2025-05-20 | Balanced and Elastic End-to-end Training of Dynamic LLMs | Mohamed Wahib et.al. | 2505.14864 | null |
| 2025-05-20 | Solving MNIST with a globally trained Mixture of Quantum Experts | Paolo Alessandro Xavier Tognini et.al. | 2505.14789 | null |
| 2025-05-27 | Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training | Mengru Wang et.al. | 2505.14681 | null |
| 2025-05-21 | Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach | Umberto Cappellazzo et.al. | 2505.14336 | null |
| 2025-05-20 | FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation | Shaolin Zhu et.al. | 2505.14256 | null |
| 2025-05-20 | THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation | Yunlong Liang et.al. | 2505.14173 | null |
| 2025-05-20 | Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition | Shuo Zhang et.al. | 2505.14143 | null |
| 2025-05-20 | Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging | Ryo Bertolissi et.al. | 2505.14136 | null |
| 2025-05-20 | Generalizable Multispectral Land Cover Classification via Frequency-Aware Mixture of Low-Rank Token Experts | Xi Chen et.al. | 2505.14088 | null |
| 2025-05-20 | StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning | Huaijie Wang et.al. | 2505.13997 | null |
| 2025-05-20 | Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive Prompting | Bao-Ngoc Dao et.al. | 2505.13944 | link |
| 2025-05-27 | U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding | Ziqian Wang et.al. | 2505.13880 | link |
| 2025-05-20 | EfficientLLM: Efficiency in Large Language Models | Zhengqing Yuan et.al. | 2505.13840 | null |
| 2025-05-19 | CompeteSMoE – Statistically Guaranteed Mixture of Experts Training via Competition | Nam V. Nguyen et.al. | 2505.13380 | link |
| 2025-05-19 | Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference | Shuqing Luo et.al. | 2505.13345 | link |
| 2025-05-19 | Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models | Lucas Berry et.al. | 2505.13273 | null |
| 2025-05-19 | True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics | Christoph Jürgen Hemmer et.al. | 2505.13192 | null |
| 2025-05-23 | Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures | Tuan Thai et.al. | 2505.13052 | null |
| 2025-05-19 | TransferTraj: A Vehicle Trajectory Learning Model for Region and Task Transferability | Tonglong Wei et.al. | 2505.12672 | null |
| 2025-05-30 | Scene-Adaptive Motion Planning with Explicit Mixture of Experts and Interaction-Oriented Optimization | Hongbiao Zhu et.al. | 2505.12311 | null |
| 2025-05-22 | Model Merging in Pre-training of Large Language Models | Yunshui Li et.al. | 2505.12082 | null |
| 2025-05-22 | Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression Recognition | Runduo Han et.al. | 2505.12007 | link |
| 2025-05-17 | MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging | Zihuan Qiu et.al. | 2505.11883 | null |
| 2025-05-17 | Improving Coverage in Combined Prediction Sets with Weighted p-values | Gina Wong et.al. | 2505.11785 | null |
| 2025-05-16 | HessFormer: Hessians at Foundation Scale | Diego Granziol et.al. | 2505.11564 | null |
| 2025-05-10 | PRIME: Physics-Related Intelligent Mixture of Experts for Transistor Characteristics Prediction | Zhenxing Dou et.al. | 2505.11523 | null |
| 2025-05-19 | MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production | Chao Jin et.al. | 2505.11432 | null |
| 2025-05-21 | MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems | Yinsicheng Jiang et.al. | 2505.11415 | null |
| 2025-05-16 | A Fast Kernel-based Conditional Independence test with Application to Causal Discovery | Oliver Schacht et.al. | 2505.11085 | null |
| 2025-05-16 | On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating | Huy Nguyen et.al. | 2505.10860 | null |
| 2025-05-14 | PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning | Zongqian Li et.al. | 2505.09519 | link |
| 2025-05-14 | Qwen3 Technical Report | An Yang et.al. | 2505.09388 | link |
| 2025-05-14 | Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures | Chenggang Zhao et.al. | 2505.09343 | null |
| 2025-05-29 | Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony | Shaoyu Wang et.al. | 2505.08944 | null |
| 2025-05-13 | PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts | Yang Su et.al. | 2505.08719 | null |
| 2025-05-25 | AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale | Yunjie Ji et.al. | 2505.08311 | null |
| 2025-05-12 | UMoE: Unifying Attention and FFN with Shared Experts | Yuanhang Yang et.al. | 2505.07260 | null |
| 2025-05-11 | Seed1.5-VL Technical Report | Dong Guo et.al. | 2505.07062 | null |
| 2025-05-21 | FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers | Tianyu Chen et.al. | 2505.06858 | null |
| 2025-05-11 | The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts | Enric Boix-Adsera et.al. | 2505.06839 | null |
| 2025-05-10 | Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free | Zihan Qiu et.al. | 2505.06708 | link |
| 2025-05-30 | Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding | Dawei Huang et.al. | 2505.06685 | link |
| 2025-05-10 | QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration | HamidReza Imani et.al. | 2505.06481 | null |
| 2025-05-06 | A Sensitivity-Driven Expert Allocation Method in LoRA-MoE for Efficient Fine-Tuning | Junzhou Xu et.al. | 2505.06272 | null |
| 2025-05-12 | FloE: On-the-Fly MoE Inference on Memory-constrained GPU | Yuxin Zhou et.al. | 2505.05950 | null |
| 2025-05-09 | MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design | Haojie Duanmu et.al. | 2505.05799 | link |
| 2025-05-10 | SDR-RDMA: Software-Defined Reliability Architecture for Planetary Scale RDMA Communication | Mikhail Khalilov et.al. | 2505.05366 | null |
| 2025-05-08 | Divide-and-Conquer: Cold-Start Bundle Recommendation via Mixture of Diffusion Experts | Ming Li et.al. | 2505.05035 | null |
| 2025-05-07 | Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs | Yehui Tang et.al. | 2505.04519 | null |
| 2025-05-07 | SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios | Ning Cheng et.al. | 2505.04201 | null |
| 2025-05-07 | LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress? | Teddy Foley et.al. | 2505.04075 | link |
| 2025-05-07 | Shadow Wireless Intelligence: Large Language Model-Driven Reasoning in Covert Communications | Yuanai Xie et.al. | 2505.04068 | null |
| 2025-05-24 | Perception-Informed Neural Networks: Beyond Physics-Informed Neural Networks | Mehran Mazandarani et.al. | 2505.03806 | null |
| 2025-05-02 | MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance | Xing Hu et.al. | 2505.03804 | null |
| 2025-05-06 | Towards Smart Point-and-Shoot Photography | Jiawan Li et.al. | 2505.03638 | null |
| 2025-05-06 | Faster MoE LLM Inference for Extremely Large Models | Haoqi Yang et.al. | 2505.03531 | null |
| 2025-05-06 | STAR-Rec: Making Peace with Length Variance and Pattern Diversity in Sequential Recommendation | Maolin Wang et.al. | 2505.03484 | null |
| 2025-05-06 | 3D Gaussian Splatting Data Compression with Mixture of Priors | Lei Liu et.al. | 2505.03310 | null |
| 2025-05-05 | Finger Pose Estimation for Under-screen Fingerprint Sensor | Xiongjun Guan et.al. | 2505.02481 | link |
| 2025-05-05 | Multimodal Deep Learning-Empowered Beam Prediction in Future THz ISAC Systems | Kai Zhang et.al. | 2505.02381 | null |
| 2025-05-08 | Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques | Sanjay Surendranath Girija et.al. | 2505.02309 | null |
| 2025-05-04 | Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields | Zhenxing Mi et.al. | 2505.02005 | link |
| 2025-05-03 | Backdoor Attacks Against Patch-based Mixture of Experts | Cedric Chan et.al. | 2505.01811 | link |
| 2025-05-01 | MoxE: Mixture of xLSTM Experts with Entropy-Aware Routing for Efficient Language Modeling | Abdoul Majid O. Thiombiano et.al. | 2505.01459 | null |
| 2025-05-02 | Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders | Rogelio A Mancisidor et.al. | 2505.01134 | null |
| 2025-05-02 | CoCoAFusE: Beyond Mixtures of Experts via Model Fusion | Aurelio Raffa Ugolini et.al. | 2505.01105 | null |
| 2025-05-01 | Improving Routing in Sparse Mixture of Experts with Graph of Tokens | Tam Nguyen et.al. | 2505.00792 | null |
| 2025-05-01 | CICADA: Cross-Domain Interpretable Coding for Anomaly Detection and Adaptation in Multivariate Time Series | Tian Lan et.al. | 2505.00415 | null |
| 2025-05-01 | Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing | Piotr Piękos et.al. | 2505.00315 | link |
| 2025-04-30 | Online Federation For Mixtures of Proprietary Agents with Black-Box Encoders | Xuwei Yang et.al. | 2505.00216 | null |
| 2025-05-08 | Identifying Critical Dependencies in Large-Scale Continuous Software Engineering | Anastasiia Tkalich et.al. | 2504.21437 | null |
| 2025-04-29 | TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts | Pradip Kunwar et.al. | 2504.21190 | null |
| 2025-04-29 | Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization | Shuai Gong et.al. | 2504.21063 | null |
| 2025-04-26 | PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight | Ben Goertzel et.al. | 2504.21029 | null |
| 2025-04-29 | In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer | Zechuan Zhang et.al. | 2504.20690 | null |
| 2025-05-30 | ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting | Yu Zhang et.al. | 2504.20630 | null |
| 2025-04-29 | MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification | Yichu Xu et.al. | 2504.20509 | null |
| 2025-04-29 | FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks | Wenjing Xiao et.al. | 2504.20446 | null |
| 2025-04-29 | MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation | Amaan Izhar et.al. | 2504.20343 | link |
| 2025-04-28 | Accelerating Mixture-of-Experts Training with Adaptive Expert Replication | Athinagoras Skiadopoulos et.al. | 2504.19925 | null |
| 2025-04-28 | DUETS: Setting expectations for asteroseismic binaries and binary products with synthetic populations | A. Mazzi et.al. | 2504.19866 | null |
| 2025-04-28 | Decentralization of Generative AI via Mixture of Experts for Wireless Networks: A Comprehensive Survey | Yunting Xu et.al. | 2504.19660 | null |
| 2025-05-04 | ARTEMIS: Autoregressive End-to-End Trajectory Planning with Mixture of Experts for Autonomous Driving | Renju Feng et.al. | 2504.19580 | link |
| 2025-05-30 | Versatile Framework for Song Generation with Prompt-based Control | Yu Zhang et.al. | 2504.19062 | null |
| 2025-04-29 | BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts | Qingyue Wang et.al. | 2504.18598 | null |
| 2025-04-25 | NoEsis: Differentially Private Knowledge Transfer in Modular LLM Adaptation | Rob Romijnders et.al. | 2504.18147 | null |
| 2025-05-15 | TGDT: A Temporal Graph-based Digital Twin for Urban Traffic Corridors | Nooshin Yousefzadeh et.al. | 2504.18008 | null |
| 2025-06-11 | Unveiling the Hidden: Movie Genre and User Bias in Spoiler Detection | Haokai Zhang et.al. | 2504.17834 | link |
| 2025-04-22 | Compass-V2 Technical Report | Sophia Maria et.al. | 2504.15527 | null |
| 2025-04-21 | Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images | Jonathan Brokman et.al. | 2504.15470 | link |
| 2025-04-17 | D $^{2}$ MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving | Haodong Wang et.al. | 2504.15299 | null |
| 2025-04-23 | MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core | Dennis Liu et.al. | 2504.14960 | null |
| 2025-04-20 | Evaluating Temporal Plasticity in Foundation Time Series Models for Incremental Fine-tuning | Jia Liu et.al. | 2504.14677 | null |
| 2025-04-29 | Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning | ByteDance Seed et.al. | 2504.13914 | null |
| 2025-04-18 | Multi-Type Context-Aware Conversational Recommender Systems via Mixture-of-Experts | Jie Zou et.al. | 2504.13655 | null |
| 2025-04-18 | HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering | Alexander Rusnak et.al. | 2504.13590 | null |
| 2025-04-18 | Dense Backpropagation Improves Training for Sparse Mixture-of-Experts | Ashwinee Panda et.al. | 2504.12463 | link |
| 2025-04-16 | Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models | Yuanbo Tang et.al. | 2504.12359 | null |
| 2025-04-16 | Trend Filtered Mixture of Experts for Automated Gating of High-Frequency Flow Cytometry Data | Sangwon Hyun et.al. | 2504.12287 | null |
| 2025-04-16 | The Discovery of Two Quadruple Star Systems with the Second and Third Shortest Outer Periods | Brian P. Powell et.al. | 2504.12239 | null |
| 2025-04-16 | MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models | Hang Yuan et.al. | 2504.12234 | null |
| 2025-04-13 | Transmission of low energy electrons through a polyethylene terephthalate 800-nm diameter nanocapillary | Li Pengfei et.al. | 2504.11479 | null |
| 2025-04-15 | Simulation-based inference for stochastic nonlinear mixed-effects models with applications in systems biology | Henrik Häggström et.al. | 2504.11279 | link |
| 2025-05-22 | Harnessing the Computation Redundancy in ViTs to Boost Adversarial Transferability | Jiani Liu et.al. | 2504.10804 | null |
| 2025-04-14 | Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning | LeiLei Ma et.al. | 2504.09990 | null |
| 2025-04-14 | DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training | Masahiro Tanaka et.al. | 2504.09983 | null |
| 2025-04-14 | Multi-objective Bayesian Optimization With Mixed-categorical Design Variables for Expensive-to-evaluate Aeronautical Applications | Nathalie Bartoli et.al. | 2504.09930 | null |
| 2025-04-14 | Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming | Zhiqiang He et.al. | 2504.09906 | null |
| 2025-04-13 | Mixture-of-Shape-Experts (MoSE): End-to-End Shape Dictionary Framework to Prompt SAM for Generalizable Medical Segmentation | Jia Wei et.al. | 2504.09601 | null |
| 2025-04-12 | MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints | Yichao Yuan et.al. | 2504.09345 | null |
| 2025-04-12 | Mixture of Group Experts for Learning Invariant Representations | Lei Kang et.al. | 2504.09265 | null |
| 2025-04-12 | Exploring Modality Disruption in Multimodal Fake News Detection | Moyang Liu et.al. | 2504.09154 | null |
| 2025-05-08 | RouterKT: Mixture-of-Experts for Knowledge Tracing | Han Liao et.al. | 2504.08989 | null |
| 2025-03-23 | ExpertRAG: Efficient RAG with Mixture of Experts – Optimizing Context Retrieval for Adaptive LLM Responses | Esmail Gumaan et.al. | 2504.08744 | null |
| 2025-04-11 | Regularized infill criteria for multi-objective Bayesian optimization with application to aircraft design | Robin Grapin et.al. | 2504.08671 | null |
| 2025-04-11 | Millions of States: Designing a Scalable MoE Architecture with RWKV-7 Meta-learner | Liu Xiao et.al. | 2504.08247 | null |
| 2025-04-10 | C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing | Zhongyang Li et.al. | 2504.07964 | link |
| 2025-04-11 | Scaling Laws for Native Multimodal Models | Mustafa Shukor et.al. | 2504.07951 | null |
| 2025-04-10 | Cluster-Driven Expert Pruning for Mixture-of-Experts Large Language Models | Hongcheng Guo et.al. | 2504.07807 | link |
| 2025-04-10 | Adaptive Detection of Fast Moving Celestial Objects Using a Mixture of Experts and Physical-Inspired Neural Network | Peng Jia et.al. | 2504.07777 | null |
| 2025-04-15 | Kimi-VL Technical Report | Kimi Team et.al. | 2504.07491 | link |
| 2025-04-09 | MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution | Zhe Wang et.al. | 2504.07308 | link |
| 2025-04-11 | Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models | Ling Team et.al. | 2504.07158 | null |
| 2025-05-28 | Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations | Zican Dong et.al. | 2504.06792 | null |
| 2025-04-24 | FedMerge: Federated Personalization via Model Merging | Shutong Chen et.al. | 2504.06768 | null |
| 2025-04-08 | S’MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning | Hanqing Zeng et.al. | 2504.06426 | null |
| 2025-04-08 | HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference | Shuzhang Zhong et.al. | 2504.05897 | link |
| 2025-04-08 | Adaptive Substructure-Aware Expert Model for Molecular Property Prediction | Tianyi Jiang et.al. | 2504.05844 | null |
| 2025-04-10 | Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations | Ajay Jaiswal et.al. | 2504.05586 | null |
| 2025-04-07 | SUEDE:Shared Unified Experts for Physical-Digital Face Attack Detection Enhancement | Zuying Xie et.al. | 2504.04818 | null |
| 2025-04-06 | On the Spatial Structure of Mixture-of-Experts in Transformers | Daniel Bershatsky et.al. | 2504.04444 | null |
| 2025-04-05 | Collaboration and Controversy Among Experts: Rumor Early Detection by Tuning a Comment Generator | Bing Wang et.al. | 2504.04076 | link |
| 2025-04-04 | HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs | Yongji Wu et.al. | 2504.03871 | null |
| 2025-04-01 | Detecting Financial Fraud with Hybrid Deep Learning: A Mix-of-Experts Approach to Sequential and Anomalous Patterns | Diego Vallarino et.al. | 2504.03750 | null |
| 2025-04-01 | A Unified Virtual Mixture-of-Experts Framework:Enhanced Inference and Hallucination Mitigation in Single-Model System | Mingyan Liu et.al. | 2504.03739 | null |
| 2025-03-26 | A multi-scale lithium-ion battery capacity prediction using mixture of experts and patch-based MLP | Yuzhu Lei et.al. | 2504.03706 | link |
| 2025-04-04 | RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation | Hanbo Bi et.al. | 2504.03166 | null |
| 2025-06-01 | TeleMoM: Consensus-Driven Telecom Intelligence via Mixture of Models | Xinquan Wang et.al. | 2504.02712 | null |
| 2025-04-07 | MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators | Beichen Huang et.al. | 2504.02658 | link |
| 2025-04-24 | Cognitive Memory in Large Language Models | Lianlei Shan et.al. | 2504.02441 | null |
| 2025-04-23 | MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism | Ruidong Zhu et.al. | 2504.02263 | null |
| 2025-04-20 | Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design | Mohan Zhang et.al. | 2504.01337 | null |
| 2025-04-01 | Mixture-of-Experts for Distributed Edge Computing with Channel-Aware Gating Function | Qiuchen Song et.al. | 2504.00819 | null |
| 2025-04-01 | DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism | Dengchun Li et.al. | 2504.00661 | link |
| 2025-04-01 | CFP: Low-overhead Profiling-based Intra-operator Parallelism Generation by Preserving Communication-Free Structures | Weifang Hu et.al. | 2504.00598 | null |
| 2025-04-01 | Continual Cross-Modal Generalization | Yan Xia et.al. | 2504.00561 | null |
| 2025-04-01 | Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection | Shunxin Chen et.al. | 2504.00458 | null |
| 2025-03-31 | Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion | Jiagen Li et.al. | 2503.23721 | null |
| 2025-05-16 | Mixture of Routers | Jia-Chen Zhang et.al. | 2503.23362 | null |
| 2025-05-25 | MoLAE: Mixture of Latent Experts for Parameter-Efficient Language Models | Zehua Liu et.al. | 2503.23100 | null |
| 2025-03-29 | S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning | Giang Do et.al. | 2503.23007 | null |
| 2025-03-29 | Sparse Mixture of Experts as Unified Competitive Learning | Giang Do et.al. | 2503.22996 | null |
| 2025-03-26 | Reasoning Beyond Limits: Advances and Open Problems for LLMs | Mohamed Amine Ferrag et.al. | 2503.22732 | null |
| 2025-04-01 | Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities | Raman Dutt et.al. | 2503.22517 | null |
| 2025-04-29 | RocketPPA: Ultra-Fast LLM-Based PPA Estimator at Code-Level Abstraction | Armin Abdollahi et.al. | 2503.21971 | null |
| 2025-05-08 | Binarity at LOw Metallicity (BLOeM): Enhanced multiplicity of early B-type dwarfs and giants at $Z=0.2\,{\rm Z}_\odot$ | J. I. Villaseñor et.al. | 2503.21936 | null |
| 2025-03-27 | iMedImage Technical Report | Ran Wei et.al. | 2503.21836 | null |
| 2025-03-27 | LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models | Hengyuan Zhao et.al. | 2503.21227 | null |
| 2025-05-17 | MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness | Zihao Zheng et.al. | 2503.21135 | null |
| 2025-03-26 | Optimal Scaling Laws for Efficiency Gains in a Theoretical Transformer-Augmented Sectional MoE Framework | Soham Sane et.al. | 2503.20750 | null |
| 2025-03-26 | UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines | Chen Tang et.al. | 2503.20748 | null |
| 2025-03-26 | Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning | Sashuai Zhou et.al. | 2503.20633 | null |
| 2025-04-14 | MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation | Rongyu Zhang et.al. | 2503.20384 | null |
| 2025-03-26 | Modality-Independent Brain Lesion Segmentation with Privacy-aware Continual Learning | Yousef Sadegheih et.al. | 2503.20326 | link |
| 2025-03-31 | Resilient Sensor Fusion under Adverse Sensor Failures via Multi-Modal Expert Fusion | Konyul Park et.al. | 2503.19776 | null |
| 2025-04-30 | BiPrompt-SAM: Enhancing Image Segmentation via Explicit Selection between Point and Text Prompts | Suzhe Xu et.al. | 2503.19769 | null |
| 2025-03-25 | M $^2$ CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation | Ziyuan Liu et.al. | 2503.19406 | null |
| 2025-04-21 | Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design | Rui Xie et.al. | 2503.18869 | null |
| 2025-04-30 | Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding | Tianyu Chen et.al. | 2503.18578 | null |
| 2025-03-24 | SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking | Wenrui Cai et.al. | 2503.18338 | null |
| 2025-04-01 | Challenging Dataset and Multi-modal Gated Mixture of Experts Model for Remote Sensing Copy-Move Forgery Understanding | Ze Zhang et.al. | 2503.18104 | link |
| 2025-03-22 | Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM | Codefuse et.al. | 2503.17793 | null |
| 2025-03-25 | Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts | Yike Yuan et.al. | 2503.16057 | null |
| 2025-03-21 | UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations | Debabrata Mandal et.al. | 2503.15868 | null |
| 2025-03-20 | Mixture of Lookup Experts | Shibo Jie et.al. | 2503.15798 | link |
| 2025-03-21 | Leveraging MoE-based Large Language Model for Zero-Shot Multi-Task Semantic Communication | Sin-Yu Huang et.al. | 2503.15722 | null |
| 2025-04-29 | SemEval-2025 Task 1: AdMIRe – Advancing Multimodal Idiomaticity Representation | Thomas Pickard et.al. | 2503.15358 | null |
| 2025-03-21 | Body-Hand Modality Expertized Networks with Cross-attention for Fine-grained Skeleton Action Recognition | Seungyeon Cho et.al. | 2503.14960 | null |
| 2025-03-18 | Core-Periphery Principle Guided State Space Model for Functional Connectome Classification | Minheng Chen et.al. | 2503.14655 | null |
| 2025-03-18 | DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers | Minglei Shi et.al. | 2503.14487 | null |
| 2025-03-18 | MAST-Pro: Dynamic Mixture-of-Experts for Adaptive Segmentation of Pan-Tumors with Knowledge-Driven Prompts | Runqi Meng et.al. | 2503.14355 | null |
| 2025-03-18 | Frac-Connections: Fractional Extension of Hyper-Connections | Defa Zhu et.al. | 2503.14125 | null |
| 2025-03-18 | SNAKE: A Sustainable and Multi-functional Traffic Analysis System utilizing Specialized Large-Scale Models with a Mixture of Experts Architecture | Tian Qin et.al. | 2503.13808 | null |
| 2025-03-13 | Ensemble Learning for Large Language Models in Text and Code Generation: A Survey | Mari Ashiga et.al. | 2503.13505 | null |
| 2025-03-17 | Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge | Shengling Qin et.al. | 2503.13421 | null |
| 2025-05-10 | Channel Estimation for Pinching-Antenna Systems (PASS) | Jian Xiao et.al. | 2503.13268 | null |
| 2025-03-17 | Federated Mixture-of-Expert for Non-Overlapped Cross-Domain Sequential Recommendation | Yu Liu et.al. | 2503.13254 | null |
| 2025-05-21 | Fast filtering of non-Gaussian models using Amortized Optimal Transport Maps | Mohammad Al-Jarrah et.al. | 2503.12633 | link |
| 2025-03-16 | MoECollab: Democratizing LLM Development Through Collaborative Mixture of Experts | Harshit et.al. | 2503.12592 | null |
| 2025-03-16 | MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification | Jianwei Zhao et.al. | 2503.12401 | null |
| 2025-05-10 | Adaptive Mixture of Low-Rank Experts for Robust Audio Spoofing Detection | Qixian Chen et.al. | 2503.12010 | null |
| 2025-03-14 | FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-the-World LoRA | Jieming Bian et.al. | 2503.11880 | null |
| 2025-03-10 | MELON: Multimodal Mixture-of-Experts with Spectral-Temporal Fusion for Long-Term Mobility Estimation in Critical Care | Jiaqing Zhang et.al. | 2503.11695 | null |
| 2025-03-14 | A Review of DeepSeek Models’ Key Innovative Techniques | Chengen Wang et.al. | 2503.11486 | null |
| 2025-03-14 | MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling | Rachel S. Y. Teo et.al. | 2503.11144 | link |
| 2025-03-13 | Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores | Chenpeng Wu et.al. | 2503.10725 | link |
| 2025-05-19 | dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis | Luyuan Xie et.al. | 2503.10412 | null |
| 2025-04-10 | Continual Text-to-Video Retrieval with Frame Fusion and Task-Aware Routing | Zecheng Zhao et.al. | 2503.10111 | link |
| 2025-03-12 | MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching | Tairan Xu et.al. | 2503.09716 | null |
| 2025-03-12 | Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework | Bakary Badjie et.al. | 2503.09504 | null |
| 2025-03-12 | Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment | Nazanin Moradinasab et.al. | 2503.09498 | link |
| 2025-04-01 | Astrea: A MOE-based Visual Understanding Model with Progressive Alignment | Xiaoda Yang et.al. | 2503.09445 | null |
| 2025-03-12 | Automatic Operator-level Parallelism Planning for Distributed Deep Learning – A Mixed-Integer Programming Approach | Ruifeng She et.al. | 2503.09357 | null |
| 2025-03-12 | Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference | Mohammad Siavashi et.al. | 2503.09304 | null |
| 2025-03-13 | FaVChat: Unlocking Fine-Grained Facial Video Understanding with Multimodal Large Language Models | Fufangchen Zhao et.al. | 2503.09158 | null |
| 2025-05-22 | MoE-Loco: Mixture of Experts for Multitask Locomotion | Runhan Huang et.al. | 2503.08564 | null |
| 2025-03-11 | BoundarEase: Fostering Constructive Community Engagement to Inform More Equitable Student Assignment Policies | Cassandra Overney et.al. | 2503.08543 | link |
| 2025-03-11 | Accelerating MoE Model Inference with Expert Sharding | Oana Balmau et.al. | 2503.08467 | null |
| 2025-03-26 | Uni $\textbf{F}^2$ ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models | Junzhe Li et.al. | 2503.08120 | null |
| 2025-03-11 | MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models | Han Zhao et.al. | 2503.08007 | null |
| 2025-03-10 | Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM | Yongqiang Yao et.al. | 2503.07680 | null |
| 2025-04-01 | TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster | Kanghui Ning et.al. | 2503.07649 | null |
| 2025-03-05 | BrainNet-MoE: Brain-Inspired Mixture-of-Experts Learning for Neurological Disease Identification | Jing Zhang et.al. | 2503.07640 | null |
| 2025-03-05 | Mixture of Experts Made Intrinsically Interpretable | Xingyi Yang et.al. | 2503.07639 | null |
| 2025-03-26 | GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts | Minwen Liao et.al. | 2503.07417 | null |
| 2025-04-18 | A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications | Siyuan Mu et.al. | 2503.07137 | link |
| 2025-03-10 | VMTS: Vision-Assisted Teacher-Student Reinforcement Learning for Multi-Terrain Locomotion in Bipedal Robots | Fu Chen et.al. | 2503.07049 | link |
| 2025-03-10 | ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration | Mengting Ai et.al. | 2503.06881 | link |
| 2025-03-10 | eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference | Suraiya Tairin et.al. | 2503.06823 | null |
| 2025-03-09 | MoFE: Mixture of Frozen Experts Architecture | Jean Seo et.al. | 2503.06491 | null |
| 2025-03-25 | Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models | Nguyen Do et.al. | 2503.06413 | link |
| 2025-03-08 | MoEMoE: Question Guided Dense and Scalable Sparse Mixture-of-Expert for Multi-source Multi-modal Answering | Vinay Kumar Verma et.al. | 2503.06296 | null |
| 2025-03-08 | A Novel Trustworthy Video Summarization Algorithm Through a Mixture of LoRA Experts | Wenzhuo Du et.al. | 2503.06064 | null |
| 2025-03-08 | MANDARIN: Mixture-of-Experts Framework for Dynamic Delirium and Coma Prediction in ICU Patients: Development and Validation of an Acute Brain Dysfunction Prediction Model | Miguel Contreras et.al. | 2503.06059 | null |
| 2025-03-08 | GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices | Xudong Lu et.al. | 2503.06019 | null |
| 2025-03-03 | How Do Consumers Really Choose: Exposing Hidden Preferences with the Mixture of Experts Model | Diego Vallarino et.al. | 2503.05800 | null |
| 2025-03-11 | Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning | Justin Chih-Yao Chen et.al. | 2503.05641 | null |
| 2025-03-07 | FMT:A Multimodal Pneumonia Detection Model Based on Stacking MOE Framework | Jingyu Xu et.al. | 2503.05626 | null |
| 2025-04-15 | Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts | Weigao Sun et.al. | 2503.05447 | link |
| 2025-03-10 | Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs | Ling Team et.al. | 2503.05139 | null |
| 2025-03-07 | Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts | Shwai He et.al. | 2503.05066 | null |
| 2025-03-06 | Continual Pre-training of MoEs: How robust is your router? | Benjamin Thérien et.al. | 2503.05029 | null |
| 2025-02-25 | Comparative Analysis Based on DeepSeek, ChatGPT, and Google Gemini: Features, Techniques, Performance, Future Prospects | Anichur Rahman et.al. | 2503.04783 | null |
| 2025-03-19 | Predictable Scale: Part I – Optimal Hyperparameter Scaling Law in Large Language Model Pretraining | Houyi Li et.al. | 2503.04715 | null |
| 2025-03-07 | Question-Aware Gaussian Experts for Audio-Visual Question Answering | Hongyeob Kim et.al. | 2503.04459 | link |
| 2025-03-19 | Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling | Yan Li et.al. | 2503.04398 | null |
| 2025-03-06 | A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery | Yiheng Zhu et.al. | 2503.04362 | null |
| 2025-03-06 | Quantum metric induced magneto-optical effects in $\mathcal{PT}$ -symmetric antiferromagnets | Yongpan Li et.al. | 2503.04312 | null |
| 2025-03-06 | DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval | Yating Liu et.al. | 2503.04144 | null |
| 2025-03-05 | VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology Detection | Enkhtogtokh Togootogtokh et.al. | 2503.03797 | link |
| 2025-03-09 | Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs | Haoran Fan et.al. | 2503.03594 | link |
| 2025-03-06 | Convergence Rates for Softmax Gating Mixture of Experts | Huy Nguyen et.al. | 2503.03213 | null |
| 2025-03-04 | MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation | Weihang Wang et.al. | 2503.02799 | link |
| 2025-03-04 | FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting | Congluo Xu et.al. | 2503.02692 | null |
| 2025-03-06 | Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer | Yujiao Yang et.al. | 2503.02495 | link |
| 2025-03-04 | Tabby: Tabular Data Synthesis with Language Models | Sonia Cromp et.al. | 2503.02152 | null |
| 2025-03-03 | ECG-EmotionNet: Nested Mixture of Expert (NMoE) Adaptation of ECG-Foundation Model for Driver Emotion Recognition | Nastaran Mansourian et.al. | 2503.01750 | null |
| 2025-03-03 | Effective High-order Graph Representation Learning for Credit Card Fraud Detection | Yao Zou et.al. | 2503.01556 | null |
| 2025-03-03 | DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models | Yongqi Huang et.al. | 2503.01359 | null |
| 2025-03-03 | PROPER: A Progressive Learning Framework for Personalized Large Language Models with Group-Level Adaptation | Linhai Zhang et.al. | 2503.01303 | null |
| 2025-03-03 | Unify and Anchor: A Context-Aware Transformer for Cross-Domain Time Series Forecasting | Xiaobin Hong et.al. | 2503.01157 | null |
| 2025-03-02 | Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion | Daiki Nishiyama et.al. | 2503.00925 | null |
| 2025-03-01 | Efficiently Editing Mixture-of-Experts Models with Compressed Experts | Yifei He et.al. | 2503.00634 | null |
| 2025-03-01 | CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering | Tianyu Huai et.al. | 2503.00413 | null |
| 2025-02-28 | CoSMoEs: Compact Sparse Mixture of Experts | Patrick Huber et.al. | 2503.00245 | null |
| 2025-02-26 | Omni-SILA: Towards Omni-scene Driven Visual Sentiment Identifying, Locating and Attributing in Videos | Jiamin Luo et.al. | 2503.00049 | null |
| 2025-03-01 | R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts | Zhongyang Li et.al. | 2502.20395 | link |
| 2025-02-27 | Mixture of Experts for Recognizing Depression from Interview and Reading Tasks | Loukas Ilias et.al. | 2502.20213 | null |
| 2025-02-27 | Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems | Zeyi Ren et.al. | 2502.20183 | null |
| 2025-02-27 | UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook | Yidi Jiang et.al. | 2502.20067 | null |
| 2025-02-27 | AsymLoRA: Harmonizing Data Conflicts and Commonalities in MLLMs | Xuyang Wei et.al. | 2502.20035 | link |
| 2025-03-04 | Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts | Shulai Zhang et.al. | 2502.19811 | link |
| 2025-02-27 | Extension of SUSY SU(5) GUTs with Nelson-Barr models | Junji Hisano et.al. | 2502.19686 | null |
| 2025-03-15 | Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization | Taishi Nakamura et.al. | 2502.19261 | null |
| 2025-02-26 | OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment | Jiaxin Deng et.al. | 2502.18965 | null |
| 2025-02-26 | Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLM | Junxiao Ma et.al. | 2502.18863 | null |
| 2025-02-25 | Generative AI-enabled Wireless Communications for Robust Low-Altitude Economy Networking | Changyuan Zhao et.al. | 2502.18118 | null |
| 2025-02-09 | MoEMba: A Mamba-based Mixture of Experts for High-Density EMG-based Hand Gesture Recognition | Mehran Shabanpour et.al. | 2502.17457 | null |
| 2025-03-17 | The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE | Andrei Chernov et.al. | 2502.17391 | null |
| 2025-02-24 | Delta Decompression for MoE-based LLMs Compression | Hao Gu et.al. | 2502.17298 | link |
| 2025-02-24 | Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks | Andrei Chernov et.al. | 2502.17187 | null |
| 2025-02-24 | Muon is Scalable for LLM Training | Jingyuan Liu et.al. | 2502.16982 | link |
| 2025-03-07 | BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference | Zewen Jin et.al. | 2502.16927 | null |
| 2025-02-24 | ENACT-Heart – ENsemble-based Assessment Using CNN and Transformer on Heart Sounds | Jiho Han et.al. | 2502.16914 | null |
| 2025-02-26 | Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment | Chenghao Fan et.al. | 2502.16894 | null |
| 2025-02-22 | An Autonomous Network Orchestration Framework Integrating Large Language Models with Continual Reinforcement Learning | Masoud Shokrnezhad et.al. | 2502.16198 | null |
| 2025-02-20 | A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models | Mengyang Sun et.al. | 2502.15828 | link |
| 2025-03-20 | Binary-Integer-Programming Based Algorithm for Expert Load Balancing in Mixture-of-Experts Models | Yuan Sun et.al. | 2502.15451 | link |
| 2025-03-02 | Tight Clusters Make Specialized Experts | Stefan K. Nielsen et.al. | 2502.15315 | link |
| 2025-02-21 | Multimodal Graph-Based Variational Mixture of Experts Network for Zero-Shot Multimodal Information Extraction | Baohang Zhou et.al. | 2502.15290 | link |
| 2025-02-20 | Ray-Tracing for Conditionally Activated Neural Networks | Claudio Gallicchio et.al. | 2502.14788 | null |
| 2025-02-21 | ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model | Zhongyi Zhou et.al. | 2502.14420 | null |
| 2025-02-19 | MoM: Linear Sequence Modeling with Mixture-of-Memories | Jusen Du et.al. | 2502.13685 | link |
| 2025-02-19 | Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts | Xin Li et.al. | 2502.13577 | null |
| 2025-02-18 | MoBA: Mixture of Block Attention for Long-Context LLMs | Enzhe Lu et.al. | 2502.13189 | link |
| 2025-02-18 | Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models | Gyeongman Kim et.al. | 2502.12947 | null |
| 2025-03-13 | DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs | Minxuan Lv et.al. | 2502.12455 | null |
| 2025-02-17 | From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs | Kumari Nishu et.al. | 2502.12325 | null |
| 2025-02-17 | Binarity at LOw Metallicity (BLOeM): Multiplicity of early B-type supergiants in the Small Magellanic Cloud | N. Britavskiy et.al. | 2502.12239 | null |
| 2025-02-17 | Accurate Expert Predictions in MoE Inference via Cross-Layer Gate | Zhiyuan Fang et.al. | 2502.12224 | null |
| 2025-02-17 | How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines | Ayan Sengupta et.al. | 2502.12051 | null |
| 2025-02-17 | Connector-S: A Survey of Connectors in Multi-modal Large Language Models | Xun Zhu et.al. | 2502.11453 | null |
| 2025-02-16 | Mixture of Tunable Experts – Behavior Modification of DeepSeek-R1 at Inference Time | Robert Dahlke et.al. | 2502.11096 | null |
| 2025-02-16 | ClimateLLM: Efficient Weather Forecasting via Frequency-Aware Large Language Models | Shixuan Li et.al. | 2502.11059 | null |
| 2025-02-15 | Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization | Matthew Lyle Olson et.al. | 2502.10928 | null |
| 2025-02-11 | MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition | Sungnyun Kim et.al. | 2502.10447 | null |
| 2025-04-03 | Heterogeneous Mixture of Experts for Remote Sensing Image Super-Resolution | Bowen Chen et.al. | 2502.09654 | null |
| 2025-02-14 | Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting | Nicholas Dronen et.al. | 2502.09500 | link |
| 2025-02-12 | The MoE-Empowered Edge LLMs Deployment: Architecture, Challenges, and Opportunities | Ning Li et.al. | 2502.08381 | null |
| 2025-02-12 | Mixture of Decoupled Message Passing Experts with Entropy Constraint for General Node Classification | Xuanze Chen et.al. | 2502.08083 | null |
| 2025-03-09 | Training Sparse Mixture Of Experts Text Embedding Models | Zach Nussbaum et.al. | 2502.07972 | link |
| 2025-02-11 | Memory Analysis on the Training Course of DeepSeek Models | Ping Zhang et.al. | 2502.07846 | null |
| 2025-02-11 | LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid | Weigao Sun et.al. | 2502.07563 | link |
| 2025-02-11 | MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks | Lotfi Abdelkrim Mecharbat et.al. | 2502.07422 | null |
| 2025-02-11 | Online Aggregation of Trajectory Predictors | Alex Tong et.al. | 2502.07178 | null |
| 2025-02-09 | Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline | Zhiyuan Fang et.al. | 2502.06888 | null |
| 2025-02-12 | Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach | Xu Zhang et.al. | 2502.06832 | null |
| 2025-02-10 | MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing | Seokjin Go et.al. | 2502.06643 | null |
| 2025-02-10 | Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE | Haiduo Huang et.al. | 2502.06282 | link |
| 2025-02-10 | Fair-MoE: Fairness-Oriented Mixture of Experts in Vision-Language Models | Peiran Wang et.al. | 2502.06094 | null |
| 2025-02-08 | Mol-MoE: Training Preference-Guided Routers for Molecule Generation | Diego Calanzone et.al. | 2502.05633 | null |
| 2025-02-17 | UbiMoE: A Ubiquitous Mixture-of-Experts Vision Transformer Accelerator With Hybrid Computation Pattern on FPGA | Jiale Dong et.al. | 2502.05602 | link |
| 2025-02-07 | fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving | Hanfei Yu et.al. | 2502.05370 | null |
| 2025-02-07 | Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts | Roussel Desmond Nzoyem et.al. | 2502.05335 | null |
| 2025-02-19 | Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient | Jan Ludziejewski et.al. | 2502.05172 | null |
| 2025-02-06 | Mixture of neural operator experts for learning boundary conditions and model selection | Dwyer Deighan et.al. | 2502.04562 | null |
| 2025-02-06 | CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference | Zehua Pei et.al. | 2502.04416 | link |
| 2025-02-06 | Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning | Peizhuang Cong et.al. | 2502.03884 | null |
| 2025-03-20 | A Retrospective Systematic Study on Hierarchical Sparse Query Transformer-assisted Ultrasound Screening for Early Hepatocellular Carcinoma | Chaoyin She et.al. | 2502.03772 | link |
| 2025-02-05 | (GG) MoE vs. MLP on Tabular Data | Andrei Chernov et.al. | 2502.03608 | null |
| 2025-02-05 | RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts | Tuan Truong et.al. | 2502.03044 | null |
| 2025-03-22 | On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation | Nghiem T. Diep et.al. | 2502.03029 | null |
| 2025-02-05 | Scaling Laws for Upcycling Mixture-of-Experts Language Models | Seng Pei Liew et.al. | 2502.03009 | null |
| 2025-02-04 | ReGNet: Reciprocal Space-Aware Long-Range Modeling and Multi-Property Prediction for Crystals | Jianan Nie et.al. | 2502.02748 | null |
| 2025-02-04 | Binarity at LOw Metallicity (BLOeM): The multiplicity properties and evolution of BAF-type supergiants | L. R. Patrick et.al. | 2502.02644 | null |
| 2025-02-04 | Hecate: Unlocking Efficient Sparse Model Training via Fully Sharded Sparse Data Parallelism | Yuhao Qing et.al. | 2502.02581 | null |
| 2025-02-07 | Brief analysis of DeepSeek R1 and its implications for Generative AI | Sarah Mercer et.al. | 2502.02523 | null |
| 2025-02-04 | M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference | Nikhil Bhendawade et.al. | 2502.02040 | null |
| 2025-02-07 | MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation | Haibo Tong et.al. | 2502.01719 | null |
| 2025-02-27 | Omni-Mol: Exploring Universal Convergent Space for Omni-Molecular Tasks | Chengxin Hu et.al. | 2502.01074 | null |
| 2025-02-17 | MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs | Yuhang Zhou et.al. | 2502.00997 | null |
| 2025-02-03 | CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling | Xinze Wang et.al. | 2502.00965 | null |
| 2025-02-02 | UniGraph2: Learning a Unified Embedding Space to Bind Multimodal Graphs | Yufei He et.al. | 2502.00806 | null |
| 2025-02-02 | Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective | Yujin Oh et.al. | 2502.00619 | null |
| 2025-02-05 | Weak-to-Strong Diffusion with Reflection | Lichen Bai et.al. | 2502.00473 | null |
| 2025-02-01 | PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated Learning | Yu Feng et.al. | 2502.00354 | link |
| 2025-02-01 | Sigmoid Self-Attention is Better than Softmax Self-Attention: A Mixture-of-Experts Perspective | Fanqi Yan et.al. | 2502.00281 | null |
| 2025-01-31 | Pheromone-based Learning of Optimal Reasoning Paths | Anirudh Chari et.al. | 2501.19278 | null |
| 2025-03-03 | Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning | Minh Le et.al. | 2501.18936 | null |
| 2025-01-30 | MolGraph-xLSTM: A graph-based dual-level xLSTM framework with multi-head mixture-of-experts for enhanced molecular representation and interpretability | Yan Sun et.al. | 2501.18439 | null |
| 2025-02-10 | Free Agent in Agent-Based Mixture-of-Experts Generative AI Framework | Jung-Hua Liu et.al. | 2501.17903 | null |
| 2025-01-29 | Heuristic-Informed Mixture of Experts for Link Prediction in Multilayer Networks | Lucio La Cava et.al. | 2501.17557 | null |
| 2025-01-28 | 3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow | Yueen Ma et.al. | 2501.16698 | null |
| 2025-01-27 | Searching for GEMS: Discovery and Characterization of Two Brown Dwarfs Around M Dwarfs | Alexander Larsen et.al. | 2501.16554 | null |
| 2025-02-12 | One-for-All Does Not Work! Enhancing Vulnerability Detection by Mixture-of-Experts (MoE) | Xu Yang et.al. | 2501.16454 | null |
| 2025-01-29 | Mixture of Experts (MoE): A Big Data Perspective | Wensheng Gan et.al. | 2501.16352 | null |
| 2025-01-27 | Static Batching of Irregular Workloads on GPUs: Framework and Application to Efficient MoE Model Inference | Yinghan Li et.al. | 2501.16103 | null |
| 2025-01-25 | ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning | Shangqian Gao et.al. | 2501.15316 | null |
| 2025-03-16 | FreqMoE: Enhancing Time Series Forecasting through Frequency Decomposition Mixture of Experts | Ziqi Liu et.al. | 2501.15125 | link |
| 2025-01-25 | Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning | Ziyu Zhao et.al. | 2501.15103 | null |
| 2025-01-24 | Mean-field limit from general mixtures of experts to quantum neural networks | Anderson Melchor Hernandez et.al. | 2501.14660 | null |
| 2025-01-30 | Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential Recommendation | Shengzhe Zhang et.al. | 2501.14269 | link |
| 2025-03-12 | Sparse Mixture-of-Experts for Non-Uniform Noise Reduction in MRI Images | Zeyun Deng et.al. | 2501.14198 | null |
| 2025-01-23 | CSAOT: Cooperative Multi-Agent System for Active Object Tracking | Hy Nguyen et.al. | 2501.13994 | null |
| 2025-01-22 | Autonomy-of-Experts Models | Ang Lv et.al. | 2501.13074 | null |
| 2025-02-07 | LLM4WM: Adapting LLM for Wireless Multi-Tasking | Xuanyu Liu et.al. | 2501.12983 | null |
| 2025-01-22 | UniUIR: Considering Underwater Image Restoration as An All-in-One Learner | Xu Zhang et.al. | 2501.12981 | null |
| 2025-01-22 | BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR | Guodong Ma et.al. | 2501.12602 | null |
| 2025-02-26 | Modality Interactive Mixture-of-Experts for Fake News Detection | Yifan Liu et.al. | 2501.12431 | link |
| 2025-01-21 | SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection | Xiaocheng Zhang et.al. | 2501.12430 | null |
| 2025-01-25 | Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models | Samira Abnar et.al. | 2501.12370 | null |
| 2025-01-21 | MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks | Qishen Zhou et.al. | 2501.12281 | link |
| 2025-02-04 | Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models | Zihan Qiu et.al. | 2501.11873 | null |
| 2025-01-18 | FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models | Xinglin Pan et.al. | 2501.10714 | null |
| 2024-12-16 | DAOP: Data-Aware Offloading and Predictive Pre-Calculation for Efficient MoE Inference | Yujie Zhang et.al. | 2501.10375 | null |
| 2025-01-17 | OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning | Jinyuan Feng et.al. | 2501.10062 | null |
| 2025-01-17 | LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading | Kuan-Ming Liu et.al. | 2501.09636 | null |
| 2025-01-16 | MoE $^2$ : Optimizing Collaborative Inference for Edge Large Language Models | Lyudong Jin et.al. | 2501.09410 | null |
| 2025-01-14 | MiniMax-01: Scaling Foundation Models with Lightning Attention | MiniMax et.al. | 2501.08313 | null |
| 2025-01-14 | Guiding polaritonic energy and momentum through two-dimensional Bravais lattices | Zhonglin Li et.al. | 2501.08123 | null |
| 2025-02-11 | GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism | Chen Tang et.al. | 2501.07890 | null |
| 2025-01-18 | PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration | Xiaoshui Huang et.al. | 2501.07762 | null |
| 2025-01-13 | A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis | Binyu Zhang et.al. | 2501.07016 | link |
| 2025-01-12 | Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning | Hanwen Zhong et.al. | 2501.06884 | link |
| 2025-01-12 | A Comprehensive Evaluation of Large Language Models on Mental Illnesses in Arabic Context | Noureldin Zahran et.al. | 2501.06859 | null |
| 2025-03-18 | TAMER: A Test-Time Adaptive MoE-Driven Framework for EHR Representation Learning | Yinghao Zhu et.al. | 2501.05661 | link |
| 2025-01-09 | Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing | Mengfan Liu et.al. | 2501.05313 | null |
| 2025-01-07 | LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes | Xiang Xu et.al. | 2501.04004 | link |
| 2025-01-07 | mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training | Xudong Liao et.al. | 2501.03905 | null |
| 2025-01-08 | Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection | Donatella Genovese et.al. | 2501.03432 | null |
| 2025-01-06 | Solving the Porous Medium Equation with the eXtreme Mesh deformation approach (X-Mesh) | Alexandre Chemin et.al. | 2501.03083 | null |
| 2025-01-05 | Soft and Compliant Contact-Rich Hair Manipulation and Care | Uksang Yoo et.al. | 2501.02630 | null |
| 2025-01-12 | Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning | Zhongyi Zhou et.al. | 2501.02198 | null |
| 2025-03-18 | MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders | Jiajun Cao et.al. | 2501.01709 | null |
| 2025-01-01 | REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization | Huyen Nguyen et.al. | 2501.00779 | null |
| 2025-01-06 | Superposition in Transformers: A Novel Way of Building Mixture of Experts | Ayoub Ben Chaliah et.al. | 2501.00530 | link |
| 2024-12-31 | CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection | Xiaolei Wang et.al. | 2501.00346 | null |
| 2024-12-30 | SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection | Yuxuan Li et.al. | 2412.20665 | link |
| 2024-12-29 | Multimodal Variational Autoencoder: a Barycentric View | Peijie Qiu et.al. | 2412.20487 | null |
| 2025-03-05 | A Comprehensive Framework for Reliable Legal AI: Combining Specialized Expert Systems and Adaptive Refinement | Sidra Nasir et.al. | 2412.20468 | null |
| 2024-12-29 | Mind the Data Gap: Bridging LLMs to Enterprise Data Integration | Moe Kayali et.al. | 2412.20331 | null |
| 2025-03-09 | UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity | Jingbo Lin et.al. | 2412.20157 | link |
| 2024-12-28 | Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection | Yaning Zhang et.al. | 2412.20156 | null |
| 2025-02-18 | DeepSeek-V3 Technical Report | DeepSeek-AI et.al. | 2412.19437 | link |
| 2024-12-26 | AskChart: Universal Chart Understanding through Textual Enhancement | Xudong Yang et.al. | 2412.19146 | link |
| 2024-12-30 | Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection | Xiaoyu Huang et.al. | 2412.19108 | null |
| 2024-12-26 | DAPoinTr: Domain Adaptive Point Transformer for Point Cloud Completion | Yinghui Li et.al. | 2412.19062 | link |
| 2025-03-10 | Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making | David Shoresh et.al. | 2412.18593 | link |
| 2024-12-24 | BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing | Yingjie Ma et.al. | 2412.18065 | link |
| 2024-12-23 | UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition | Li Fu et.al. | 2412.17507 | null |
| 2025-02-01 | BrainMAP: Learning Multiple Activation Pathways in Brain Networks | Song Wang et.al. | 2412.17404 | link |
| 2024-12-23 | Efficient fine-tuning methodology of text embedding models for information retrieval: contrastive learning penalty (clp) | Jeongsu Yu et.al. | 2412.17364 | link |
| 2024-12-22 | The Fermat curves and arrangements of lines and conics | Nils Peder Astrup Toft et.al. | 2412.16993 | null |
| 2024-12-22 | Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models | Elie Antoine et.al. | 2412.16971 | null |
| 2024-12-18 | GraphLoRA: Empowering LLMs Fine-Tuning via Graph Collaboration of MoE | Ting Bai et.al. | 2412.16216 | null |
| 2024-12-20 | Theory of Mixture-of-Experts for Mobile Edge Computing | Hongbo Li et.al. | 2412.15690 | null |
| 2024-12-19 | MoEtion: Efficient and Reliable Checkpointing for Mixture-of-Experts Models at Scale | Swapnil Gandhi et.al. | 2412.15411 | null |
| 2025-01-03 | Qwen2.5 Technical Report | Qwen et.al. | 2412.15115 | link |
| 2025-02-27 | ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing | Ziteng Wang et.al. | 2412.14711 | link |
| 2025-01-22 | A Survey on Inference Optimization Techniques for Mixture of Experts Models | Jiacheng Liu et.al. | 2412.14219 | link |
| 2024-12-18 | SEKE: Specialised Experts for Keyword Extraction | Matej Martinc et.al. | 2412.14087 | link |
| 2024-12-18 | MedCoT: Medical Chain of Thought via Hierarchical Expert | Jiaxiang Liu et.al. | 2412.13736 | link |
| 2024-12-17 | SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks | Mátyás Vincze et.al. | 2412.13053 | null |
| 2024-12-17 | Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning | Moritz Reuss et.al. | 2412.12953 | null |
| 2025-01-09 | CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition | He Wang et.al. | 2412.12760 | null |
| 2024-12-16 | Investigating Mixture of Experts in Dense Retrieval | Effrosyni Sokli et.al. | 2412.11864 | null |
| 2024-12-20 | Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture | Jingze Shi et.al. | 2412.11834 | link |
| 2024-12-16 | Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation | Svetlana Pavlitska et.al. | 2412.11608 | link |
| 2024-12-16 | Enhancing Healthcare Recommendation Systems with a Multimodal LLMs-based MOE Architecture | Jingyu Xu et.al. | 2412.11557 | null |
| 2024-12-14 | DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification | Yuhao Wang et.al. | 2412.10650 | link |
| 2024-12-13 | DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding | Zhiyu Wu et.al. | 2412.10302 | link |
| 2024-12-13 | Llama 3 Meets MoE: Efficient Upcycling | Aditya Vavre et.al. | 2412.09952 | link |
| 2024-12-20 | Memory Layers at Scale | Vincent-Pierre Berges et.al. | 2412.09764 | link |
| 2025-01-10 | Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine | Xiaoshuang Huang et.al. | 2412.09278 | link |
| 2024-12-12 | MoSLD: An Extremely Parameter-Efficient Mixture-of-Shared LoRAs for Multi-Task Learning | Lulu Zhao et.al. | 2412.08946 | null |
| 2024-11-26 | Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection | Tzu-Ting Yang et.al. | 2412.08651 | null |
| 2025-01-18 | Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective | Minh Le et.al. | 2412.08285 | null |
| 2025-02-12 | Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node Classification | Xuanze Chen et.al. | 2412.08193 | link |
| 2024-12-10 | MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning | Yufei Ma et.al. | 2412.07405 | null |
| 2024-12-10 | Post-Training Statistical Calibration for Higher Activation Sparsity | Vui Seng Chua et.al. | 2412.07174 | link |
| 2025-03-02 | MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems | Yao Fu et.al. | 2412.07067 | null |
| 2024-12-07 | Partition of Unity Physics-Informed Neural Networks (POU-PINNs): An Unsupervised Framework for Physics-Informed Domain Decomposition and Mixtures of Experts | Arturo Rodriguez et.al. | 2412.06842 | null |
| 2024-12-09 | Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset | Xiao Wang et.al. | 2412.06647 | link |
| 2024-12-09 | UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts | Zhen Wan et.al. | 2412.06340 | null |
| 2024-12-08 | Hallucination-aware Optimization for Large Language Model-empowered Communications | Yinqiu Liu et.al. | 2412.06007 | link |
| 2024-12-10 | An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism | Qing Zhang et.al. | 2412.05821 | null |
| 2024-12-10 | RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts | Xu Liu et.al. | 2412.05679 | link |
| 2024-12-07 | SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts | Gengze Zhou et.al. | 2412.05552 | link |
| 2024-12-07 | Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers | Boxun Xu et.al. | 2412.05540 | null |
| 2024-12-23 | Steps are all you need: Rethinking STEM Education with Prompt Engineering | Krishnasai Addala et.al. | 2412.05023 | null |
| 2024-12-05 | Customize Segment Anything Model for Multi-Modal Semantic Segmentation with Mixture of LoRA Experts | Chenyang Zhu et.al. | 2412.04220 | null |
| 2025-03-02 | Monet: Mixture of Monosemantic Experts for Transformers | Jungwoo Park et.al. | 2412.04139 | link |
| 2024-12-05 | Meta-Reinforcement Learning With Mixture of Experts for Generalizable Multi Access in Heterogeneous Wireless Networks | Zhaoyang Liu et.al. | 2412.03850 | null |
| 2024-12-04 | Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond | Loukas Ilias et.al. | 2412.03483 | null |
| 2024-12-03 | CA-MoE: Channel-Adapted MoE for Incremental Weather Forecasting | Hao Chen et.al. | 2412.02503 | null |
| 2025-02-14 | MQFL-FHE: Multimodal Quantum Federated Learning Framework with Fully Homomorphic Encryption | Siddhant Dutta et.al. | 2412.01858 | null |
| 2025-01-22 | Yi-Lightning Technical Report | Alan Wake et.al. | 2412.01253 | null |
| 2024-11-30 | Mixture of Experts for Node Classification | Yu Shi et.al. | 2412.00418 | null |
| 2025-01-22 | HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting | Shaohan Yu et.al. | 2412.00316 | null |
| 2024-11-27 | Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference | Andrii Skliar et.al. | 2412.00099 | null |
| 2025-02-16 | Condense, Don’t Just Prune: Enhancing Efficiency and Performance in MoE Layer Pruning | Mingyu Cao et.al. | 2412.00069 | link |
| 2024-11-29 | LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References | Shuguo Jiang et.al. | 2411.19758 | null |
| 2024-11-28 | On the effectiveness of discrete representations in sparse mixture of experts | Giang Do et.al. | 2411.19402 | null |
| 2024-11-28 | Bayesian Cluster Weighted Gaussian Models | Panagiotis Papastamoulis et.al. | 2411.18957 | link |
| 2024-11-27 | UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS | Haomin Zhuang et.al. | 2411.18797 | null |
| 2024-11-27 | Complexity Experts are Task-Discriminative Learners for Any Image Restoration | Eduard Zamfir et.al. | 2411.18466 | null |
| 2024-11-27 | Mixture of Experts in Image Classification: What’s the Sweet Spot? | Mathurin Videau et.al. | 2411.18322 | null |
| 2024-11-26 | $H^3$ Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs | Selim Furkan Tekin et.al. | 2411.17792 | link |
| 2024-11-26 | The Tempered Finite Element Method | Antoine Quiriny et.al. | 2411.17564 | null |
| 2024-11-25 | Staleness-Centric Optimizations for Efficient Diffusion MoE Inference | Jiajun Luo et.al. | 2411.16786 | null |
| 2024-12-02 | MH-MoE: Multi-Head Mixture-of-Experts | Shaohan Huang et.al. | 2411.16205 | null |
| 2024-11-25 | LDACP: Long-Delayed Ad Conversions Prediction Model for Bidding Strategy | Peng Cui et.al. | 2411.16095 | null |
| 2024-11-24 | Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution | Haiquan Wang et.al. | 2411.15871 | null |
| 2024-11-24 | LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training | Xiaoye Qu et.al. | 2411.15708 | link |
| 2024-11-23 | Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts | Qizhou Chen et.al. | 2411.15432 | null |
| 2024-11-23 | Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation | Fahao Chen et.al. | 2411.15419 | null |
| 2024-11-21 | Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning | Jiange Yang et.al. | 2411.14519 | null |
| 2024-11-20 | MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification | Yuxuan Chen et.al. | 2411.13004 | null |
| 2024-11-23 | KAAE: Numerical Reasoning for Knowledge Graphs via Knowledge-aware Attributes Learning | Ming Yin et.al. | 2411.12950 | null |
| 2025-02-06 | Ultra-Sparse Memory Network | Zihao Huang et.al. | 2411.12364 | null |
| 2025-01-28 | CNMBERT: A Model for Converting Hanyu Pinyin Abbreviations to Chinese Characters | Zishuo Feng et.al. | 2411.11770 | link |
| 2024-11-18 | MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs | Shiyi Cao et.al. | 2411.11217 | null |
| 2024-11-16 | Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts | Jinqiang Long et.al. | 2411.10669 | link |
| 2024-11-15 | Weakly-Supervised Multimodal Learning on MIMIC-CXR | Andrea Agostini et.al. | 2411.10356 | link |
| 2024-11-21 | Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-scale MoE Models | Wei Wang et.al. | 2411.10003 | null |
| 2024-11-13 | Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection | Vima Gupta et.al. | 2411.08982 | null |
| 2024-11-13 | Sparse Upcycling: Inference Inefficient Finetuning | Sasha Doubov et.al. | 2411.08968 | null |
| 2024-11-13 | LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing | Xiaonan Nie et.al. | 2411.08446 | null |
| 2024-11-12 | Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach | Renzi Wang et.al. | 2411.08232 | null |
| 2024-11-12 | PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model | Yilun Liu et.al. | 2411.08212 | null |
| 2024-11-08 | Biodynamic Analysis of Alpine Skiing with a Skier-Ski-Snow Interaction Model | Nan Gao et.al. | 2411.08056 | null |
| 2024-11-12 | Towards Vision Mixture of Experts for Wildlife Monitoring on the Edge | Emmanuel Azuh Mensah et.al. | 2411.07834 | null |
| 2024-11-11 | Adaptive Conditional Expert Selection Network for Multi-domain Recommendation | Kuiyao Dong et.al. | 2411.06826 | null |
| 2024-11-11 | WDMoE: Wireless Distributed Mixture of Experts for Large Language Models | Nan Xue et.al. | 2411.06681 | null |
| 2024-11-09 | Learning Mixtures of Experts with EM | Quentin Fruytier et.al. | 2411.06056 | null |
| 2024-11-08 | NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts | Yen-Ting Lin et.al. | 2411.05945 | null |
| 2024-11-05 | DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts | Zelin Yao et.al. | 2411.03025 | link |
| 2024-11-05 | Advancing Robust Underwater Acoustic Target Recognition through Multi-task Learning and Multi-Gate Mixture-of-Experts | Yuan Xie et.al. | 2411.02787 | null |
| 2024-11-27 | SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models | Jianyi Zhang et.al. | 2411.02433 | link |
| 2024-11-06 | Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent | Xingwu Sun et.al. | 2411.02265 | null |
| 2024-12-27 | FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation | Ziwei Zhan et.al. | 2411.02115 | null |
| 2024-11-06 | Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis | Mohammad Zbeeb et.al. | 2411.01929 | link |
| 2025-02-10 | RS-MoE: A Vision-Language Model with Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering | Hui Lin et.al. | 2411.01595 | null |
| 2025-02-10 | Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation | Mingrui Liu et.al. | 2411.01457 | null |
| 2024-11-06 | HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference | Peng Tang et.al. | 2411.01433 | null |
| 2024-12-12 | HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy | Shuqing Luo et.al. | 2411.01288 | link |
| 2024-11-02 | PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment | Dongxu Liu et.al. | 2411.01245 | null |
| 2024-11-01 | MoE-I $^2$ : Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition | Cheng Yang et.al. | 2411.01016 | null |
| 2024-11-01 | LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models | Nam V. Nguyen et.al. | 2411.00918 | link |
| 2024-10-16 | TradExpert: Revolutionizing Trading with Mixture of Expert LLMs | Qianggang Ding et.al. | 2411.00782 | null |
| 2024-11-01 | MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization | Jingming Guo et.al. | 2411.00662 | link |
| 2024-11-01 | A Fast, Analytic Empirical Model of the Gaia Data Release 3 Astrometric Orbit Catalog Selection Function | Casey Y. Lam et.al. | 2411.00654 | link |
| 2024-10-31 | Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts | Xiang Deng et.al. | 2410.23836 | null |
| 2024-10-30 | Efficient and Interpretable Grammatical Error Correction with Mixture of Experts | Muhammad Reza Qorib et.al. | 2410.23507 | link |
| 2024-10-30 | Stealing User Prompts from Mixture of Experts | Itay Yona et.al. | 2410.22884 | null |
| 2024-10-30 | MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning | Xujia Wang et.al. | 2410.22782 | null |
| 2025-02-08 | ProMoE: Fast MoE-based LLM Serving using Proactive Caching | Xiaoniu Song et.al. | 2410.22134 | null |
| 2024-10-29 | Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging | Li Shen et.al. | 2410.21804 | null |
| 2024-10-29 | Neural Experts: Mixture of Experts for Implicit Neural Representations | Yizhak Ben-Shabat et.al. | 2410.21643 | null |
| 2024-11-07 | FinTeamExperts: Role Specialized MOEs For Financial Analysis | Yue Yu et.al. | 2410.21338 | null |
| 2024-10-28 | Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving | Jiyao Wang et.al. | 2410.21086 | null |
| 2024-10-27 | Towards a Blockchain and Opportunistic Edge Driven Metaverse of Everything | Paula Fraga-Lamas et.al. | 2410.20594 | null |
| 2024-10-27 | Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation | Maohao Shen et.al. | 2410.20336 | null |
| 2024-10-27 | GUMBEL-NERF: Representing Unseen Objects as Part-Compositional Neural Radiance Fields | Yusuke Sekikawa et.al. | 2410.20306 | null |
| 2024-11-12 | LLMs Can Evolve Continually on Modality for X-Modal Reasoning | Jiazuo Yu et.al. | 2410.20178 | link |
| 2024-10-25 | DMT-HI: MOE-based Hyperbolic Interpretable Deep Manifold Transformation for Unspervised Dimensionality Reduction | Zelin Zang et.al. | 2410.19504 | link |
| 2025-01-27 | Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis | Weikai Li et.al. | 2410.19225 | link |
| 2024-10-24 | Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design | Ruisi Cai et.al. | 2410.19123 | link |
| 2024-10-24 | Mixture of Parrots: Experts improve memorization more than reasoning | Samy Jelassi et.al. | 2410.19034 | null |
| 2024-10-24 | MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases | Zhisheng Lin et.al. | 2410.18406 | null |
| 2024-10-23 | Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches | Kexin Feng et.al. | 2410.18298 | null |
| 2024-10-23 | MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning | Jingfan Zhang et.al. | 2410.18035 | null |
| 2024-10-24 | ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference | Xin He et.al. | 2410.17954 | null |
| 2024-10-23 | Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition | Artem Basharin et.al. | 2410.17765 | null |
| 2024-10-22 | Optimizing Mixture-of-Experts Inference Time Combining Model Deployment and Communication Scheduling | Jialong Li et.al. | 2410.17043 | null |
| 2024-10-21 | LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset | Ruikun Zhang et.al. | 2410.16095 | link |
| 2024-10-22 | CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts | Zhenpeng Su et.al. | 2410.16077 | link |
| 2024-10-29 | Generalizing Motion Planners with Mixture of Experts for Autonomous Driving | Qiao Sun et.al. | 2410.15774 | link |
| 2024-11-23 | ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts | Xumeng Han et.al. | 2410.15732 | null |
| 2024-10-20 | Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs | Xin Zhou et.al. | 2410.15438 | null |
| 2024-11-16 | LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration | Yuang Ai et.al. | 2410.15385 | link |
| 2024-10-19 | MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning | Suning Huang et.al. | 2410.14972 | null |
| 2024-10-29 | Collaboratively adding new knowledge to an LLM | Rhui Dih Lee et.al. | 2410.14753 | link |
| 2024-10-18 | MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts | Rachel S. Y. Teo et.al. | 2410.14574 | link |
| 2024-10-18 | Towards a Simple and Extensible Standard for Object-Centric Event Data (OCED) – Core Model, Design Space, and Lessons Learned | Dirk Fahland et.al. | 2410.14495 | link |
| 2024-10-18 | ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mobility Prediction | Haoyu He et.al. | 2410.14099 | link |
| 2024-10-17 | Enhancing Generalization in Sparse Mixture of Experts Models: The Case for Increased Expert Activation in Compositional Tasks | Jinze Zhao et.al. | 2410.13964 | null |
| 2024-10-18 | MoR: Mixture of Ranks for Low-Rank Adaptation Tuning | Chuanyu Tang et.al. | 2410.13408 | null |
| 2024-10-16 | Satellite-Terrestrial Quantum Networks and the Global Quantum Internet | Andrea Conti et.al. | 2410.13096 | null |
| 2024-10-16 | On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs | Herun Wan et.al. | 2410.12600 | null |
| 2024-10-16 | Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion | Minkyoung Cho et.al. | 2410.12592 | null |
| 2024-10-16 | Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Experts | Fanqi Yan et.al. | 2410.12258 | null |
| 2025-01-03 | EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference | Yulei Qian et.al. | 2410.12247 | null |
| 2024-10-15 | MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router | Yanyue Xie et.al. | 2410.12013 | null |
| 2024-10-15 | MoH: Multi-Head Attention as Mixture-of-Head Attention | Peng Jin et.al. | 2410.11842 | link |
| 2024-10-15 | GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation | Fei Tang et.al. | 2410.11841 | link |
| 2024-10-15 | Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models | James Vo et.al. | 2410.11654 | null |
| 2024-10-16 | Quadratic Gating Functions in Mixture of Experts: A Statistical Insight | Pedram Akbarian et.al. | 2410.11222 | null |
| 2024-10-19 | AT-MoE: Adaptive Task-planning Mixture of Experts via LoRA Approach | Xurui Li et.al. | 2410.10896 | null |
| 2024-10-01 | Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models | Keivan Alizadeh et.al. | 2410.10846 | null |
| 2024-10-16 | Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free | Ziyue Li et.al. | 2410.10814 | link |
| 2024-10-14 | Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts | Guorui Zheng et.al. | 2410.10626 | link |
| 2024-10-14 | Learning to Ground VLMs without Forgetting | Aritra Bhowmik et.al. | 2410.10491 | null |
| 2024-10-14 | Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts | Xu Liu et.al. | 2410.10469 | null |
| 2024-10-15 | Ada-K Routing: Boosting the Efficiency of MoE-based LLMs | Tongtian Yue et.al. | 2410.10456 | null |
| 2024-10-14 | Tighter Risk Bounds for Mixtures of Experts | Wissam Akretche et.al. | 2410.10397 | null |
| 2024-10-24 | Scalable Multi-Domain Adaptation of Language Models using Modular Experts | Peter Schafhalter et.al. | 2410.10181 | null |
| 2024-10-16 | Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models | Jun Luo et.al. | 2410.10114 | null |
| 2024-10-14 | AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality | Peijun Qing et.al. | 2410.10054 | link |
| 2024-10-13 | ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL | Zhanqiu Guo et.al. | 2410.09781 | null |
| 2024-10-13 | MoIN: Mixture of Introvert Experts to Upcycle an LLM | Ajinkya Tejankar et.al. | 2410.09687 | null |
| 2024-10-12 | GETS: Ensemble Temperature Scaling for Calibration in Graph Neural Networks | Dingyi Zhuang et.al. | 2410.09570 | null |
| 2024-10-11 | Semi-Supervised Learning of Noisy Mixture of Experts Models | Oh-Ran Kwon et.al. | 2410.09039 | null |
| 2024-10-11 | Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering | I-Chun Chen et.al. | 2410.08589 | null |
| 2024-10-31 | Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts | Sukwon Yun et.al. | 2410.08245 | link |
| 2024-11-20 | Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training | Gen Luo et.al. | 2410.08202 | null |
| 2024-10-10 | Efficient Dictionary Learning with Switch Sparse Autoencoders | Anish Mudide et.al. | 2410.08201 | link |
| 2024-10-18 | More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing | Sagi Shaier et.al. | 2410.08003 | null |
| 2024-10-10 | SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture | Jiayi Han et.al. | 2410.07739 | null |
| 2024-10-10 | Upcycling Large Language Models into Mixture of Experts | Ethan He et.al. | 2410.07524 | null |
| 2024-10-09 | User Feedback in Continuous Software Engineering: Revealing the State-of-Practice | Anastasiia Tkalich et.al. | 2410.07459 | null |
| 2024-10-11 | MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts | Peng Jin et.al. | 2410.07348 | null |
| 2024-10-04 | A Dynamic Approach to Stock Price Prediction: Comparing RNN and Mixture of Experts Models Across Different Volatility Profiles | Diego Vallarino et.al. | 2410.07234 | null |
| 2024-10-09 | Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders | David Noever et.al. | 2410.06462 | null |
| 2024-10-09 | Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs | Ruijia Niu et.al. | 2410.06431 | null |
| 2024-10-08 | Probing the Robustness of Theory of Mind in Large Language Models | Christian Nickel et.al. | 2410.06271 | null |
| 2024-10-08 | MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More | Wei Huang et.al. | 2410.06270 | link |
| 2024-12-17 | Aria: An Open Multimodal Native Mixture-of-Experts Model | Dongxu Li et.al. | 2410.05993 | link |
| 2024-10-08 | Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models | Siqi Wang et.al. | 2410.05661 | null |
| 2024-12-05 | Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild | Xinyu Zhao et.al. | 2410.05357 | link |
| 2024-10-07 | Multimodal Fusion Strategies for Mapping Biophysical Landscape Features | Lucia Gordon et.al. | 2410.04833 | link |
| 2024-10-06 | Realizing Video Summarization from the Path of Language-based Semantic Understanding | Kuan-Chen Mu et.al. | 2410.04511 | null |
| 2024-10-09 | Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding | Wei Wu et.al. | 2410.03553 | null |
| 2024-10-04 | Exploring the Benefit of Activation Sparsity in Pre-training | Zhengyan Zhang et.al. | 2410.03440 | link |
| 2024-10-03 | MLP-KAN: Unifying Deep Representation and Function Learning | Yunhong He et.al. | 2410.03027 | link |
| 2024-10-03 | On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions | Huy Nguyen et.al. | 2410.02935 | null |
| 2024-10-03 | Neutral residues: revisiting adapters for model extension | Franck Signe Talla et.al. | 2410.02744 | null |
| 2024-10-03 | Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping | Ziye Huang et.al. | 2410.02475 | null |
| 2024-10-03 | MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction | Zhaojian Yu et.al. | 2410.02241 | null |
| 2024-10-03 | Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts | Minh Le et.al. | 2410.02200 | null |
| 2024-10-04 | Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices | Andres Potapczynski et.al. | 2410.02117 | link |
| 2024-10-04 | EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing | Haotian Sun et.al. | 2410.02098 | null |
| 2024-10-02 | Don’t flatten, tokenize! Unlocking the key to SoftMoE’s efficacy in deep RL | Ghada Sokar et.al. | 2410.01930 | null |
| 2024-09-15 | Integrating AI’s Carbon Footprint into Risk Management Frameworks: Strategies and Tools for Sustainable Compliance in Banking Sector | Nataliya Tkachenko et.al. | 2410.01818 | null |
| 2024-10-02 | Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models | Shayekh Bin Islam et.al. | 2410.01782 | link |
| 2024-10-02 | TIC 290061484: A Triply Eclipsing Triple System with the Shortest Known Outer Period of 24.5 Days | Veselin B. Kostov et.al. | 2410.01711 | null |
| 2024-10-02 | Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging | Tingfeng Hui et.al. | 2410.01610 | null |
| 2024-10-02 | The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs | Hong Li et.al. | 2410.01417 | null |
| 2024-10-01 | MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards | Sheng Wang et.al. | 2410.00938 | null |
| 2024-10-01 | UniAdapt: A Universal Adapter for Knowledge Calibration | Tai D. Nguyen et.al. | 2410.00454 | null |
| 2024-10-01 | Robust Traffic Forecasting against Spatial Shift over Years | Hongjun Wang et.al. | 2410.00373 | link |
| 2024-09-29 | IDEA: An Inverse Domain Expert Adaptation Based Active DNN IP Protection Method | Chaohui Xu et.al. | 2410.00059 | null |
| 2024-09-30 | MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning | Haotian Zhang et.al. | 2409.20566 | null |
| 2024-09-30 | HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models | Bingshen Mu et.al. | 2409.19878 | null |
| 2024-10-02 | CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling | Jihai Zhang et.al. | 2409.19291 | link |
| 2024-11-12 | SciDFM: A Large Language Model with Mixture-of-Experts for Science | Liangtai Sun et.al. | 2409.18412 | null |
| 2024-11-01 | Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE | Xun Zhu et.al. | 2409.17508 | link |
| 2024-09-26 | A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow Prediction | Guangyu Wang et.al. | 2409.17440 | link |
| 2024-09-24 | Leveraging Mixture of Experts for Improved Speech Deepfake Detection | Viola Negroni et.al. | 2409.16077 | null |
| 2024-10-02 | Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts | Xiaoming Shi et.al. | 2409.16040 | link |
| 2024-10-31 | Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM | Fengrun Zhang et.al. | 2409.15905 | null |
| 2024-09-24 | Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks | Jiayi He et.al. | 2409.15695 | null |
| 2024-12-13 | A Gated Residual Kolmogorov-Arnold Networks for Mixtures of Experts | Hugo Inzirillo et.al. | 2409.15161 | link |
| 2024-09-23 | Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond | Hong Chen et.al. | 2409.14993 | null |
| 2024-09-21 | Routing in Sparsely-gated Language Models responds to Context | Stefan Arnold et.al. | 2409.14107 | null |
| 2024-10-01 | On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists | Dongyang Fan et.al. | 2409.13931 | link |
| 2024-09-20 | Multi-omics data integration for early diagnosis of hepatocellular carcinoma (HCC) using machine learning | Annette Spooner et.al. | 2409.13791 | null |
| 2024-09-19 | On the rationality problem for hypersurfaces | Jan Lange et.al. | 2409.12834 | null |
| 2024-09-19 | Retrieval-Augmented Test Generation: How Far Are We? | Jiho Shin et.al. | 2409.12682 | null |
| 2024-09-19 | Robust Audiovisual Speech Recognition Models with Mixture-of-Experts | Yihan Wu et.al. | 2409.12370 | null |
| 2024-09-18 | Mixture of Diverse Size Experts | Manxi Sun et.al. | 2409.12210 | null |
| 2024-09-18 | GRIN: GRadient-INformed MoE | Liyuan Liu et.al. | 2409.12136 | null |
| 2024-09-18 | Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0 | Zhiyong Wang et.al. | 2409.11909 | null |
| 2024-09-17 | LPT++: Efficient Training on Mixture of Long-tailed Experts | Bowen Dong et.al. | 2409.11323 | null |
| 2024-12-09 | LOLA – An Open-Source Massively Multilingual Large Language Model | Nikit Srivastava et.al. | 2409.11272 | link |
| 2024-09-16 | Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression | Yi-Hsin Li et.al. | 2409.10101 | null |
| 2024-11-20 | MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving | Enming Zhang et.al. | 2409.07267 | link |
| 2024-09-10 | DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models | Maryam Akhavan Aghdam et.al. | 2409.06669 | null |
| 2024-09-10 | STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning | Jaeseong Lee et.al. | 2409.06211 | null |
| 2024-10-31 | VE: Modeling Multivariate Time Series Correlation with Variate Embedding | Shangjiong Wang et.al. | 2409.06169 | link |
| 2024-09-09 | Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models | Hongyang Lei et.al. | 2409.05929 | null |
| 2024-09-09 | Optical Spiking Neurons Enable High-Speed and Energy-Efficient Optical Neural Networks | Bo Xu et.al. | 2409.05726 | null |
| 2024-09-09 | Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection | Tianwu Lei et.al. | 2409.05611 | null |
| 2024-09-06 | Hot Stars in the GALEX Ultraviolet Sky Surveys (GUVcat_AISxSDSS_HS) and the Binary Fraction of Hot Evolved Stars | Luciana Bianchi et.al. | 2409.04626 | null |
| 2024-09-05 | Interpretable mixture of experts for time series prediction under recurrent and non-recurrent conditions | Zemian Ke et.al. | 2409.03282 | null |
| 2024-09-05 | ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding | Zhengzhuo Xu et.al. | 2409.03277 | null |
| 2024-09-05 | xLAM: A Family of Large Action Models to Empower AI Agent Systems | Jianguo Zhang et.al. | 2409.03215 | link |
| 2024-09-04 | Configurable Foundation Models: Building LLMs from a Modular Perspective | Chaojun Xiao et.al. | 2409.02877 | null |
| 2024-09-04 | Pluralistic Salient Object Detection | Xuelu Feng et.al. | 2409.02368 | null |
| 2024-09-03 | OLMoE: Open Mixture-of-Experts Language Models | Niklas Muennighoff et.al. | 2409.02060 | link |
| 2024-09-05 | Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model | Hukai Huang et.al. | 2409.02050 | null |
| 2024-09-03 | BEAVER: An Enterprise Benchmark for Text-to-SQL | Peter Baile Chen et.al. | 2409.02038 | null |
| 2024-09-03 | Laser: Parameter-Efficient LLM Bi-Tuning for Sequential Recommendation with Collaborative Information | Xinyu Zhang et.al. | 2409.01605 | null |
| 2024-09-02 | Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning | Soumajyoti Sarkar et.al. | 2409.01483 | null |
| 2024-09-02 | Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching | Sungmin Yun et.al. | 2409.01141 | null |
| 2024-09-04 | Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack | Guanzhong Chen et.al. | 2409.00960 | link |
| 2024-09-02 | Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts | Youngseog Chung et.al. | 2409.00879 | null |
| 2024-09-11 | Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts | Rhui Dih Lee et.al. | 2408.17280 | null |
| 2024-08-29 | Gradient-free variational learning with conditional mixture networks | Conor Heins et.al. | 2408.16429 | link |
| 2024-09-07 | Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models | Yuncheng Yang et.al. | 2408.15915 | link |
| 2024-08-28 | Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts | Nikolas Gritsch et.al. | 2408.15901 | null |
| 2024-10-23 | LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation | Fangxun Shu et.al. | 2408.15881 | link |
| 2024-08-28 | Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts | Lean Wang et.al. | 2408.15664 | null |
| 2024-08-27 | Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis | Sakhinana Sagar Srinivas et.al. | 2408.15305 | null |
| 2024-08-28 | A Survey of Large Language Models for European Languages | Wazir Ali et.al. | 2408.15040 | null |
| 2024-08-27 | MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce | Hao Jiang et.al. | 2408.14968 | null |
| 2024-08-24 | Advancing Enterprise Spatio-Temporal Forecasting Applications: Data Mining Meets Instruction Tuning of Language Models For Multi-modal Time Series Analysis in Low-Resource Settings | Sagar Srinivas Sakhinana et.al. | 2408.13622 | null |
| 2024-09-11 | Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler | Yikang Shen et.al. | 2408.13359 | null |
| 2024-10-30 | The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities | Venkatesh Balavadhani Parthasarathy et.al. | 2408.13296 | null |
| 2024-08-23 | Guiding IoT-Based Healthcare Alert Systems with Large Language Models | Yulan Gao et.al. | 2408.13071 | null |
| 2024-08-23 | O-Mamba: O-shape State-Space Model for Underwater Image Enhancement | Chenyu Dong et.al. | 2408.12816 | link |
| 2024-08-23 | DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation | Xiaowei Mao et.al. | 2408.12809 | null |
| 2024-08-23 | Multi-Treatment Multi-Task Uplift Modeling for Enhancing User Growth | Yuxiang Wei et.al. | 2408.12803 | null |
| 2024-08-23 | La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection | Hang Zou et.al. | 2408.12793 | null |
| 2024-10-02 | SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging | Mohammadreza Pourreza et.al. | 2408.12733 | null |
| 2024-08-22 | Jamba-1.5: Hybrid Transformer-Mamba Models at Scale | Jamba Team et.al. | 2408.12570 | null |
| 2024-09-09 | Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators | Dingkang Yang et.al. | 2408.12325 | null |
| 2024-08-15 | FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models | Zhongyu Zhao et.al. | 2408.11855 | link |
| 2024-08-21 | MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing | Hao Zhou et.al. | 2408.11396 | link |
| 2024-08-21 | KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting? | Xiao Han et.al. | 2408.11306 | link |
| 2024-08-21 | FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts | Hanzi Mei et.al. | 2408.11304 | null |
| 2024-08-27 | Unboxing Occupational Bias: Grounded Debiasing of LLMs with U.S. Labor Data | Atmika Gorti et.al. | 2408.11247 | null |
| 2024-08-25 | Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting | Jianxiang Zhou et.al. | 2408.10822 | link |
| 2024-08-20 | AnyGraph: Graph Foundation Model in the Wild | Lianghao Xia et.al. | 2408.10700 | link |
| 2024-08-20 | HMoE: Heterogeneous Mixture of Experts for Language Modeling | An Wang et.al. | 2408.10681 | null |
| 2024-08-19 | AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference | Shuzhang Zhong et.al. | 2408.10284 | link |
| 2024-10-29 | FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models | Xiaochen Wang et.al. | 2408.10276 | link |
| 2024-08-26 | SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models | Anke Tang et.al. | 2408.10174 | link |
| 2024-11-01 | Customizing Language Models with Instance-wise LoRA for Sequential Recommendation | Xiaoyu Kong et.al. | 2408.10159 | link |
| 2024-08-19 | A Unified Framework for Iris Anti-Spoofing: Introducing IrisGeneral Dataset and Masked-MoE Method | Hang Zou et.al. | 2408.09752 | null |
| 2024-08-16 | Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection | Haohao Zhu et.al. | 2408.08551 | null |
| 2024-08-17 | BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts | Qizhen Zhang et.al. | 2408.08274 | null |
| 2024-05-21 | Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts | Yunxin Li et.al. | 2405.11273 | null |
| 2024-05-31 | Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models | Xudong Lu et.al. | 2402.14800 | null |
| 2024-10-29 | GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts | Shirley Wu et.al. | 2312.04693 | null |
| 2023-09-12 | Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning | Ted Zadouri et.al. | 2309.05444 | null |
| 2023-04-25 | Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism | Xin Chen et.al. | 2304.11414 | null |
| 2018-06-22 | Mixtures of Experts Models | Isobel Claire Gormley et.al. | 1806.08200 | null |
Speculative Decoding
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2026-04-02 | Goose: Anisotropic Speculation Trees for Training-Free Speculative Decoding | Tao Jin et.al. | 2604.02047 | null |
| 2026-04-02 | Reinforcement Learning for Speculative Trading under Exploratory Framework | Yun Zhao et.al. | 2604.02035 | null |
| 2026-04-02 | Phonon Thermal Hall Effect in quartz and its absence in silica | Yu Ling et.al. | 2604.01908 | null |
| 2026-03-31 | Frege in the Flesh: Biolinguistics and the Neural Enforcement of Syntactic Structures | Elliot Murphy et.al. | 2604.00291 | null |
| 2026-03-31 | Spatially modulated morphotropic phase boundaries in a compressively strained multiferroic thin film | Ting-Ran Liu et.al. | 2604.00288 | null |
| 2026-03-31 | Blockspace Under Pressure: An Analysis of Spam MEV on High-Throughput Blockchains | Wenhao Wang et.al. | 2604.00234 | null |
| 2026-03-31 | Cloudy With a Chance of Meatballs | Wolf Cukier et.al. | 2603.29883 | null |
| 2026-03-31 | Detecting speculative leaks with compositional semantics | Xaver Fabian et.al. | 2603.29800 | null |
| 2026-03-31 | Milky Way evolution on a human timescale | Eugene et.al. | 2603.29503 | null |
| 2026-03-31 | Mexican Burrowing Toads as gravitational wave detectors | Frederic V. Hessman et.al. | 2603.29334 | null |
| 2026-03-30 | The Binary-Binary Hierarchical System XY Leo: A Laboratory for Stellar Activity and Concealed Companions | D. Koçak et.al. | 2603.28934 | null |
| 2026-04-02 | A Black Hole Star at Cosmic Noon: Extreme Balmer break, photospheric continuum, and broad absorption by thick winds in a Little Red Dot at z=1.7 | Alberto Torralba et.al. | 2603.28335 | null |
| 2026-03-30 | Drift-AR: Single-Step Visual Autoregressive Generation via Anti-Symmetric Drifting | Zhen Zou et.al. | 2603.28049 | null |
| 2026-03-28 | SJD-VP: Speculative Jacobi Decoding with Verification Prediction for Autoregressive Image Generation | Bingqi Shan et.al. | 2603.27115 | null |
| 2026-03-27 | TAPS: Task Aware Proposal Distributions for Speculative Sampling | Mohamad Zbib et.al. | 2603.27027 | null |
| 2026-03-26 | S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation | Ligong Han et.al. | 2603.25702 | null |
| 2026-03-26 | Bulge Fossil Fragments as a new population of factories of gravitational wave sources in the Galaxy | F. R. Ferraro et.al. | 2603.25127 | null |
| 2026-03-26 | Decoding Market Emotions in Cryptocurrency Tweets via Predictive Statement Classification with Machine Learning and Transformers | Moein Shahiki Tash et.al. | 2603.24933 | null |
| 2026-03-25 | Quantum walk with a local spin interaction | Manami Yamagishi et.al. | 2603.24444 | null |
| 2026-03-25 | AI Fortune-Teller: Juxtaposing Shaman and AI to Reveal Human Agency in the Age of AI | Soonho Kwon et.al. | 2603.23811 | null |
| 2026-03-24 | Mars in the Australian Press, 1875-1899. 1. Interpretation, Authority and Planetary Science | Richard de Grijs et.al. | 2603.23563 | null |
| 2026-03-24 | SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning | Haoyu Huang et.al. | 2603.23483 | null |
| 2026-03-24 | RelayS2S: A Dual-Path Speculative Generation for Real-Time Dialogue | Long Mai et.al. | 2603.23346 | null |
| 2026-03-24 | Mars excitement in Australian newspapers, 1877-1899: Humour and the public negotiation of astronomical knowledge | Richard de Grijs et.al. | 2603.22906 | null |
| 2026-03-23 | From Brittle to Robust: Improving LLM Annotations for SE Optimization | Lohith Senthilkumar et.al. | 2603.22474 | null |
| 2026-03-24 | Dynamic analysis enhances issue resolution | Mingwei Liu et.al. | 2603.22048 | null |
| 2026-03-22 | On the origin of the strong internal magnetic fields of central compact objects | Kazım Yavuz Ekşi et.al. | 2603.21103 | null |
| 2026-03-21 | SWE-Next: Scalable Real-World Software Engineering Tasks for Agents | Jiarong Liang et.al. | 2603.20691 | null |
| 2026-03-21 | AEGIS: From Clues to Verdicts – Graph-Guided Deep Vulnerability Reasoning via Dialectics and Meta-Auditing | Sen Fang et.al. | 2603.20637 | null |
| 2026-03-20 | Does This Gradient Spark Joy? | Ian Osband et.al. | 2603.20526 | null |
| 2026-03-23 | ParallelVLM: Lossless Video-LLM Acceleration with Visual Alignment Aware Parallel Speculative Decoding | Quan Kong et.al. | 2603.19610 | null |
| 2026-03-19 | Beyond the Desk: Barriers and Future Opportunities for AI to Assist Scientists in Embodied Physical Tasks | Irene Hou et.al. | 2603.19504 | null |
| 2026-03-19 | Speculative Policy Orchestration: A Latency-Resilient Framework for Cloud-Robotic Manipulation | Chanh Nguyen et.al. | 2603.19418 | null |
| 2026-03-19 | The Uncertain Policy Price of Scaling Direct Air Capture | Leonardo Chiani et.al. | 2603.19143 | null |
| 2026-03-19 | A Pipelined Collaborative Speculative Decoding Framework for Efficient Edge-Cloud LLM Inference | Yida Zhang et.al. | 2603.19133 | null |
| 2026-03-19 | In the Margins: An Empirical Study of Ethereum Inscriptions | Xihan Xiong et.al. | 2603.19086 | null |
| 2026-03-19 | Act While Thinking: Accelerating LLM Agents via Pattern-Aware Speculative Tool Execution | Yifan Sui et.al. | 2603.18897 | null |
| 2026-03-19 | SJD-PAC: Accelerating Speculative Jacobi Decoding via Proactive Drafting and Adaptive Continuation | Jialiang Kang et.al. | 2603.18599 | null |
| 2026-03-19 | Dream the Dream: Futuring Communication between LGBTQ+ and Cisgender Groups in Metaverse | Anqi Wang et.al. | 2603.18578 | null |
| 2026-03-19 | SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding | Shenggui Li et.al. | 2603.18567 | null |
| 2026-03-18 | Efficient Training-Free Multi-Token Prediction via Embedding-Space Probing | Raghavv Goel et.al. | 2603.17942 | null |
| 2026-03-18 | HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness | Zihao Zheng et.al. | 2603.17573 | null |
| 2026-03-18 | “Not Just Me and My To-Do List”: Understanding Challenges of Task Management for Adults with ADHD and the Need for AI-Augmented Social Scaffolds | Jingruo Chen et.al. | 2603.17258 | null |
| 2026-03-17 | Search For a Counterpart to the Subsolar Mass Gravitational Wave Candidate S251112cm | Nicholas Vieira et.al. | 2603.17009 | null |
| 2026-03-17 | Characterizing Delusional Spirals through Human-LLM Chat Logs | Jared Moore et.al. | 2603.16567 | null |
| 2026-03-17 | SpecSteer: Synergizing Local Context and Global Reasoning for Efficient Personalized Generation | Hang Lv et.al. | 2603.16219 | null |
| 2026-03-17 | Efficient LLM Serving for Agentic Workflows: A Data Systems Perspective | Noppanat Wadlom et.al. | 2603.16104 | null |
| 2026-03-16 | Evasive Intelligence: Lessons from Malware Analysis for Evaluating AI Agents | Simone Aonzo et.al. | 2603.15457 | null |
| 2026-03-16 | The ALMA-QUARKS Survey: Evidence of an Explosive Molecular Outflow in IRAS 15520–5234 | Ariful Hoque et.al. | 2603.15040 | null |
| 2026-03-16 | MMSpec: Benchmarking Speculative Decoding for Vision-Language Models | Hui Shen et.al. | 2603.14989 | null |
| 2026-03-16 | Hyper-learning and Unlearning: A Narrative Speculation on Urbanism in Media Ecologies | Anqi Wang et.al. | 2603.14810 | null |
| 2026-03-14 | Early Rug Pull Warning for BSC Meme Tokens via Multi-Granularity Wash-Trading Pattern Profiling | Dingding Cao et.al. | 2603.13830 | null |
| 2026-03-14 | Measuring Primitive Accumulation: An Information-Theoretic Approach to Capitalist Enclosure in PIK2, Indonesia | Sandy Hardian Susanto Herho et.al. | 2603.13715 | null |
| 2026-03-13 | Towards Fluent Interaction with Cyber-Physical Architecture | Jesse T. Gonzalez et.al. | 2603.13633 | null |
| 2026-03-13 | When Drafts Evolve: Speculative Decoding Meets Online Learning | Yu-Yang Qian et.al. | 2603.12617 | null |
| 2026-03-12 | Design Exploration of Lightweight Interactions for Awareness-Supporting Technologies in Hybrid Work | Lu Liu et.al. | 2603.11977 | null |
| 2026-03-12 | Edge-Cloud Collaborative Speech Emotion Captioning via Token-Level Speculative Decoding in Audio-Language Models | Xiangyuan Xue et.al. | 2603.11397 | null |
| 2026-03-11 | One-loop mass corrections and decay widths of Type II heavy string states | Massimo Bianchi et.al. | 2603.11343 | null |
| 2026-03-11 | Self-Speculative Decoding for LLM-based ASR with CTC Encoder Drafts | George Saon et.al. | 2603.11243 | null |
| 2026-03-11 | Chasing RATs: Tracing Reading for and as Creative Activity | Sophia Liu et.al. | 2603.11031 | null |
| 2026-03-11 | XMM-Newton Observation and Optical Monitoring of the Candidate Redback Millisecond Pulsar 1FGL J0523.5 $-$ 2529 | J. P. Halpern et.al. | 2603.11028 | null |
| 2026-03-11 | Kinematics of Wolf-Rayet Stars in the LMC: Clues to Subtype Origins | Caden Burkhardt et.al. | 2603.10826 | null |
| 2026-03-11 | Supersonic flow of a Chaplygin gas past a conical wing with $Λ$ -shaped cross sections | Minghong Han et.al. | 2603.10401 | null |
| 2026-03-10 | Intrinsic Numerical Robustness and Fault Tolerance in a Neuromorphic Algorithm for Scientific Computing | Bradley H. Theilman et.al. | 2603.10246 | null |
| 2026-03-10 | Phase diagram of 4D SU(3) Yang-Mills theory at $θ=π$ via imaginary theta simulations | Akira Matsumoto et.al. | 2603.09604 | null |
| 2026-03-10 | Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation | Luxi Lin et.al. | 2603.09527 | null |
| 2026-03-09 | ConFu: Contemplate the Future for Better Speculative Sampling | Zongyue Qin et.al. | 2603.08899 | null |
| 2026-03-09 | StreamReady: Learning What to Answer and When in Long Streaming Videos | Shehreen Azad et.al. | 2603.08620 | null |
| 2026-03-09 | Scalable On-the-fly Transcoding for Adaptive Streaming of Dynamic Point Clouds | Michael Rudolph et.al. | 2603.08417 | null |
| 2026-03-09 | Colloidal Probe Atomic Force Microscopy Reveals Anomalous Underscreening: A Matter of Experimental Conditions | Thomas Tilger et.al. | 2603.08326 | null |
| 2026-03-09 | EAGLE-Pangu: Accelerator-Safe Tree Speculative Decoding on Ascend NPUs | Chang Han et.al. | 2603.08088 | null |
| 2026-03-08 | DualSpec: Accelerating Deep Research Agents via Dual-Process Action Speculation | Shuzhang Zhong et.al. | 2603.07416 | null |
| 2026-03-07 | From debt crises to financial crashes (and back): a stock-flow consistent model for stock price bubbles | Matheus R. Grasselli et.al. | 2603.07213 | null |
| 2026-03-02 | SJD-PV: Speculative Jacobi Decoding with Phrase Verification for Autoregressive Image Generation | Zhehao Yu et.al. | 2603.06666 | null |
| 2026-03-06 | What are AI researchers worried about? | Cian O’Donovan et.al. | 2603.06223 | null |
| 2026-03-06 | EvoESAP: Non-Uniform Expert Pruning for Sparse MoE | Zongfang Liu et.al. | 2603.06003 | null |
| 2026-03-09 | Balancing Latency and Accuracy of Code Completion via Local-Cloud Model Cascading | Hanzhen Lu et.al. | 2603.05974 | null |
| 2026-03-05 | Balancing Coverage and Draft Latency in Vocabulary Trimming for Faster Speculative Decoding | Ofir Ben Shoham et.al. | 2603.05210 | null |
| 2026-03-04 | Quantum foundations for quantum technologies in the International Year of Quantum (2025) | Angelo Bassi et.al. | 2603.04630 | null |
| 2026-03-04 | Raman scattering spectroscopic observation of a ferroelastic crossover in bond-frustrated PrCd $_3$P$_3$ | Jackson Davis et.al. | 2603.04539 | null |
| 2026-03-04 | Weibel Instability-Driven Seed Magnetic Fields during Reionization | Jorie McDermott et.al. | 2603.03608 | null |
| 2026-03-03 | Accelerating OpenPangu Inference on NPU via Speculative Decoding | Yuntao Dai et.al. | 2603.03383 | null |
| 2026-03-03 | Speculative Speculative Decoding | Tanishq Kumar et.al. | 2603.03251 | null |
| 2026-03-03 | Cross-Family Speculative Prefill: Training-Free Long-Context Compression with Small Draft Models | Shubhangi Upasani et.al. | 2603.02631 | null |
| 2026-03-02 | Latitude-Dependent Time Variations of the Solar Tachocline | Sarbani Basu et.al. | 2603.02321 | null |
| 2026-03-02 | Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning | Jiebin Zhang et.al. | 2603.01639 | null |
| 2026-03-02 | KERV: Kinematic-Rectified Speculative Decoding for Embodied VLA Models | Zihao Zheng et.al. | 2603.01581 | null |
| 2026-03-02 | Quasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient Verification | Guang Huang et.al. | 2603.01399 | null |
| 2026-03-01 | Proscenium: Exploring Design Spaces of Layered Information Experience on a Large Dual-Layer Transparent Display | Chen Chen et.al. | 2603.01238 | null |
| 2026-02-27 | Stellar engines and Dyson bubbles can be stable | Colin R McInnes et.al. | 2603.00203 | null |
| 2026-02-27 | Betting under Common Beliefs: The Effect of Probability Weighting | Patrick Beissner et.al. | 2602.24194 | null |
| 2026-02-27 | Task-Centric Acceleration of Small-Language Models | Dor Tsur et.al. | 2602.24174 | null |
| 2026-02-27 | LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding | Alexander Samarin et.al. | 2602.23881 | null |
| 2026-02-27 | The Auton Agentic AI Framework | Sheng Cao et.al. | 2602.23720 | null |
| 2026-02-27 | Active Learning for Planet Habitability Classification under Extreme Class Imbalance | R. I. El-Kholy et.al. | 2602.23666 | null |
| 2026-02-25 | The shape of transverse momentum spectra in hybrid hydrodynamic models | Thiago S. Domingues et.al. | 2602.22490 | null |
| 2026-02-25 | BMN-like Matrix Models | Eunwoo Lee et.al. | 2602.22163 | null |
| 2026-02-25 | Speculating for Epiplexity: How to Learn the Most from Speculative Design? | Botao Amber Hu et.al. | 2602.22132 | null |
| 2026-02-25 | Tidal disruptions of rubble piles: The case of Phobos | Harrison Agrusa et.al. | 2602.21912 | null |
| 2026-02-24 | Asymptotically (un)safe scattering amplitudes from scratch: a deep dive into the IR jungle | Benjamin Knorr et.al. | 2602.21285 | null |
| 2026-02-23 | KnapSpec: Self-Speculative Decoding via Adaptive Layer Selection as a Knapsack Problem | Seongjin Cha et.al. | 2602.20217 | null |
| 2026-02-23 | SemanticNVS: Improving Semantic Scene Understanding in Generative Novel View Synthesis | Xinya Chen et.al. | 2602.20079 | null |
| 2026-02-23 | Anisotropic magnons in a layered honeycomb ferromagnet | Travis J. Williams et.al. | 2602.19935 | null |
| 2026-02-23 | Two-parameter families of MPO integrals of motion in Heisenberg spin chains | Vsevolod I. Yashin et.al. | 2602.19741 | null |
| 2026-02-23 | Leap+Verify: Regime-Adaptive Speculative Weight Prediction for Accelerating Neural Network Training | Jeremy McEntire et.al. | 2602.19580 | null |
| 2026-02-21 | WANSpec: Leveraging Global Compute Capacity for LLM Inference | Noah Martin et.al. | 2602.18931 | null |
| 2026-02-19 | Insidious Imaginaries: A Critical Overview of AI Speculations | Dejan Grba et.al. | 2602.17383 | null |
| 2026-02-19 | Dynamic Delayed Tree Expansion For Improved Multi-Path Speculative Decoding | Rahul Thomas et.al. | 2602.16994 | null |
| 2026-02-19 | A testable framework for AI alignment: Simulation Theology as an engineered worldview for silicon-based agents | Josef A. Habdank et.al. | 2602.16987 | null |
| 2026-02-18 | Greedy Multi-Path Block Verification for Faster Decoding in Speculative Sampling | Rahul Thomas et.al. | 2602.16961 | null |
| 2026-02-18 | Privacy-Aware Split Inference with Speculative Decoding for Large Language Models over Wide-Area Networks | Michael Cunningham et.al. | 2602.16760 | null |
| 2026-02-17 | MoE-Spec: Expert Budgeting for Efficient Speculative Decoding | Bradley McDanel et.al. | 2602.16052 | null |
| 2026-02-17 | A Theoretical Approach to Stablecoin Design via Price Windows | Katherine Molinet et.al. | 2602.15981 | null |
| 2026-02-17 | Robot-Assisted Social Dining as a White Glove Service | Atharva S Kashyap et.al. | 2602.15767 | null |
| 2026-02-17 | Hot subdwarf stars from the Hamburg Quasar Survey | Ulrich Heber et.al. | 2602.15692 | null |
| 2026-02-17 | Sparrow: Text-Anchored Window Attention with Visual-Semantic Glimpsing for Speculative Decoding in Video LLMs | Libo Zhang et.al. | 2602.15318 | null |
| 2026-02-16 | Distributed Semi-Speculative Parallel Anisotropic Mesh Adaptation | Kevin Garner et.al. | 2602.15204 | null |
| 2026-02-16 | Kami of the Commons: Towards Designing Agentic AI to Steward the Commons | Botao Amber Hu et.al. | 2602.14940 | null |
| 2026-02-16 | Predicting the success of new crypto-tokens: the Pump.fun case | Giulio Marino et.al. | 2602.14860 | null |
| 2026-02-16 | Atomix: Timely, Transactional Tool Use for Reliable Agentic Workflows | Bardia Mohammadi et.al. | 2602.14849 | null |
| 2026-02-14 | Speculative Decoding with a Speculative Vocabulary | Miles Williams et.al. | 2602.13836 | null |
| 2026-02-14 | The Shadow Boss: Identifying Atomized Manipulations in Agentic Employment of XR Users using Scenario Constructions | Lik-Hang Lee et.al. | 2602.13622 | null |
| 2026-02-13 | ORAP: Optimized Row Access Prefetching for Rowhammer-mitigated Memory | Maccoy Merrell et.al. | 2602.13434 | null |
| 2026-02-13 | Training-Free Acceleration for Document Parsing Vision-Language Model with Hierarchical Speculative Decoding | Wenhui Liao et.al. | 2602.12957 | null |
| 2026-02-12 | Holographic Equidistribution | Nico Cooper et.al. | 2602.12265 | null |
| 2026-02-12 | Embodied AI Agents for Team Collaboration in Co-located Blue-Collar Work | Kaisa Vaananen et.al. | 2602.12136 | null |
| 2026-02-12 | Wisdom of the LLM Crowd: A Large Scale Benchmark of Multi-Label U.S. Election-Related Harmful Social Media Content | Qile Wang et.al. | 2602.11962 | null |
| 2026-02-11 | What do people want to fact-check? | Bijean Ghafouri et.al. | 2602.10935 | null |
| 2026-02-10 | Simulation of the Space-Charge-Limited Current Density for Time-Variant Pulsed Injection | H. Huang et.al. | 2602.09399 | null |
| 2026-02-10 | Understanding Risk and Dependency in AI Chatbot Use from User Discourse | Jianfeng Zhu et.al. | 2602.09339 | null |
| 2026-02-09 | PICASSO: Scaling CHERI Use-After-Free Protection to Millions of Allocations using Colored Capabilities | Merve Gülmez et.al. | 2602.09131 | null |
| 2026-02-09 | Benchmarking the Energy Savings with Speculative Decoding Strategies | Rohit Dutta et.al. | 2602.09113 | null |
| 2026-02-09 | Symplectic excision and distance rigidity | Yoel Groman et.al. | 2602.08969 | null |
| 2026-02-09 | Three Lessons from Citizen-Centric Participatory AI Design | Eike Schneiders et.al. | 2602.08554 | null |
| 2026-02-09 | On- and off-chain demand and supply drivers of Bitcoin price | Pavel Ciaian et.al. | 2602.08429 | null |
| 2026-02-09 | TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration | Linye Wei et.al. | 2602.08404 | null |
| 2026-02-10 | Compiler-Assisted Speculative Sampling for Accelerated LLM Inference on Heterogeneous Edge Devices | Alejandro Ruiz y Mesa et.al. | 2602.08060 | null |
| 2026-02-08 | Dark Matter as Screened Ordinary Matter | Colin D. Froggatt et.al. | 2602.07902 | null |
| 2026-02-07 | Motivic invariants of moduli stacks of Higgs bundles and bundles with connections: results and speculations | Roman Fedorov et.al. | 2602.07713 | null |
| 2026-02-07 | Series-Parallel-Loop Decompositions of Control-flow Graphs | Xuran Cai et.al. | 2602.07627 | null |
| 2026-02-07 | Astrophysical positronium and Dicke superradiance | Abdaljalel E. Alizzi et.al. | 2602.07489 | null |
| 2026-02-07 | Imagining the Alien: Human Projections and Cognitive Limitations | S. G. Djorgovski et.al. | 2602.07284 | null |
| 2026-02-06 | XShare: Collaborative in-Batch Expert Sharing for Faster MoE Inference | Daniil Vankov et.al. | 2602.07265 | null |
| 2026-02-06 | SpecAttn: Co-Designing Sparse Attention with Self-Speculative Decoding | Yikang Yue et.al. | 2602.07223 | null |
| 2026-02-06 | When RL Meets Adaptive Speculative Training: A Unified Training-Serving System | Junxiong Wang et.al. | 2602.06932 | null |
| 2026-02-06 | Continued fraction method for high overtone quasinormal modes in effective potentials with discontinuity | Guan-Ru Li et.al. | 2602.06536 | null |
| 2026-02-06 | RelayGen: Intra-Generation Model Switching for Efficient Reasoning | Jiwon Song et.al. | 2602.06454 | null |
| 2026-02-06 | Quenching Speculation in Quantum Markets via Entangled Neural Traders | Kieran Hymas et.al. | 2602.06367 | null |
| 2026-02-05 | DFlash: Block Diffusion for Flash Speculative Decoding | Jian Chen et.al. | 2602.06036 | null |
| 2026-02-05 | V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval | Dongyang Chen et.al. | 2602.06034 | null |
| 2026-02-05 | Multi-Token Prediction via Self-Distillation | John Kirchenbauer et.al. | 2602.06019 | null |
| 2026-02-05 | Measurement-Induced Dynamics of Particles and Quasiparticles in a Bose-Einstein-condensate array | Huy Nguyen et.al. | 2602.05924 | null |
| 2026-02-05 | Prompting Destiny: Negotiating Socialization and Growth in an LLM-Mediated Speculative Gameworld | Mandi Yang et.al. | 2602.05864 | null |
| 2026-02-05 | The near-continuum mechanism for extended Boltzmann theory: the non-equilibrium relaxation | Sha Liu et.al. | 2602.05775 | null |
| 2026-02-05 | Variational Speculative Decoding: Rethinking Draft Training from Token Likelihood to Sequence Acceptance | Xiandong Zou et.al. | 2602.05774 | null |
| 2026-02-05 | SDFP: Speculative Decoding with FIT-Pruned Models for Training-Free and Plug-and-Play LLM Acceleration | Hanyu Wei et.al. | 2602.05499 | null |
| 2026-02-05 | TIDE: Temporal Incremental Draft Engine for Self-Improving LLM Inference | Jiyoung Park et.al. | 2602.05145 | null |
| 2026-02-04 | SPPAM: Signature Pattern Prediction and Access-Map Prefetcher | Maccoy Merrell et.al. | 2602.04100 | null |
| 2026-02-03 | pop-cosmos: Redshifts and physical properties of KiDS-1000 galaxies | Anik Halder et.al. | 2602.03930 | null |
| 2026-02-03 | SpecMD: A Comprehensive Study On Speculative Expert Prefetching | Duc Hoang et.al. | 2602.03921 | null |
| 2026-02-04 | Beyond Tokens: Semantic-Aware Speculative Decoding for Efficient Inference by Probing Internal States | Ximing Dong et.al. | 2602.03708 | null |
| 2026-02-03 | Efficient Algorithms for Partial Constraint Satisfaction Problems over Control-flow Graphs | Xuran Cai et.al. | 2602.03588 | null |
| 2026-02-02 | The emergent Big Bang scenario | Justin C. Feng et.al. | 2602.02646 | null |
| 2026-02-02 | An Empirical Study on Noisy Data and LLM Pretraining Loss Divergence | Qizhen Zhang et.al. | 2602.02400 | null |
| 2026-02-02 | PRISM: Parametrically Refactoring Inference for Speculative Sampling Draft Models | Xuliang Wang et.al. | 2602.01762 | null |
| 2026-02-02 | A Practical Tensor-Network Compression Pipeline for Production-Scale Large Language Models | Sergii Kozyrev et.al. | 2602.01613 | null |
| 2026-02-02 | Are Security Cues Static? Rethinking Warning and Trust Indicators for Life Transitions | Sarah Tabassum et.al. | 2602.01544 | null |
| 2026-02-01 | P-EAGLE: Parallel-Drafting EAGLE with Scalable Training | Mude Hui et.al. | 2602.01469 | null |
| 2026-02-01 | Improve the Trade-off Between Watermark Strength and Speculative Sampling Efficiency for Language Models | Weiqing He et.al. | 2602.01428 | null |
| 2026-02-01 | FlowCast: Trajectory Forecasting for Scalable Zero-Cost Speculative Flow Matching | Divya Jyoti Bajpai et.al. | 2602.01329 | null |
| 2026-02-01 | PACER: Blockwise Pre-verification for Speculative Decoding with Adaptive Length | Situo Zhang et.al. | 2602.01274 | null |
| 2026-01-31 | Eternagram: Inspiring Climate Action Through LLM-based Conversational Exploration of a Post-Devastation Climate Future | Suifang Zhou et.al. | 2602.00571 | null |
| 2026-01-31 | SAGE: Accelerating Vision-Language Models via Entropy-Guided Adaptive Speculative Decoding | Yujia Tong et.al. | 2602.00523 | null |
| 2026-01-30 | TriSpec: Ternary Speculative Decoding via Lightweight Proxy Verification | Haoyun Jiang et.al. | 2601.23180 | null |
| 2026-01-30 | SpecIBT: Formally Verified Protection Against Speculative Control-Flow Hijacking | Jonathan Baumann et.al. | 2601.22978 | null |
| 2026-01-30 | Beyond Medical Chatbots: Meddollina and the Rise of Continuous Clinical Intelligence | Vaibhav Ram S. V. N. S et.al. | 2601.22645 | null |
| 2026-01-29 | Plant-Inspired Robot Design Metaphors for Ambient HRI | Victor Nikhil Antony et.al. | 2601.22387 | null |
| 2026-01-29 | Subsolar mass black holes from stellar collapse induced by primordial black holes | Thomas W. Baumgarte et.al. | 2601.22220 | null |
| 2026-01-29 | StarSD: One-for-Many Speculative Decoding | Junhao He et.al. | 2601.21622 | null |
| 2026-01-29 | SPOILER-GUARD: Gating Latency Effects of Memory Accesses through Randomized Dependency Prediction | Gayathri Subramanian et.al. | 2601.21211 | null |
| 2026-01-29 | Scaling Embeddings Outperforms Scaling Experts in Language Models | Hong Liu et.al. | 2601.21204 | null |
| 2026-01-28 | Unplugging a Seemingly Sentient Machine Is the Rational Choice – A Metaphysical Perspective | Erik J Bekkers et.al. | 2601.21016 | null |
| 2026-01-28 | Manipulation in Prediction Markets: An Agent-based Modeling Experiment | Bridget Smart et.al. | 2601.20452 | null |
| 2026-01-28 | TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs | Minjae Lee et.al. | 2601.20357 | null |
| 2026-01-26 | LTS-VoiceAgent: A Listen-Think-Speak Framework for Efficient Streaming Voice Interaction via Semantic Triggering and Incremental Reasoning | Wenhao Zou et.al. | 2601.19952 | null |
| 2026-01-27 | The Competence Crisis: A Design Fiction on AI-Assisted Research in Software Engineering | Mairieli Wessel et.al. | 2601.19628 | null |
| 2026-01-27 | DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference | Fuliang Liu et.al. | 2601.19278 | link |
| 2026-01-26 | Flatter Tokens are More Valuable for Speculative Draft Model Training | Jiaming Fan et.al. | 2601.18902 | null |
| 2026-01-26 | Towards a Proof of the Improved Quantum Null Energy Condition | Ido Ben-Dayan et.al. | 2601.18860 | null |
| 2026-01-26 | Disk-jet-wind coupling from stellar mass to supermassive black holes | Chris Done et.al. | 2601.18607 | null |
| 2026-01-30 | LLM-42: Enabling Determinism in LLM Inference with Verified Speculation | Raja Gond et.al. | 2601.17768 | null |
| 2026-01-24 | Improving User Privacy in Personalized Generation: Client-Side Retrieval-Augmented Modification of Server-Side Generated Speculations | Alireza Salemi et.al. | 2601.17569 | null |
| 2026-01-24 | Towards a Declarative Agentic Layer for Intelligent Agents in MCP-Based Server Ecosystems | Maria Jesus Rodriguez-Sanchez et.al. | 2601.17435 | null |
| 2026-01-24 | Auditing Disability Representation in Vision-Language Models | Srikant Panda et.al. | 2601.17348 | null |
| 2026-01-27 | From Clicks to Consensus: Collective Consent Assemblies for Data Governance | Lin Kyi et.al. | 2601.16752 | null |
| 2026-01-23 | Integrated Photonic Quantum Computing: From Silicon to Lithium Niobate | Hui Zhang et.al. | 2601.16484 | null |
| 2026-01-21 | MARS: Unleashing the Power of Speculative Decoding via Margin-Aware Verification | Jingwei Song et.al. | 2601.15498 | null |
| 2026-01-23 | Emergent, not Immanent: A Baradian Reading of Explainable AI | Fabio Morreale et.al. | 2601.15029 | null |
| 2026-01-13 | On the Limits of Learned Importance Scoring for KV Cache Compression | Brady Steele et.al. | 2601.14279 | null |
| 2026-01-21 | The Non-Predictability of Mispredicted Branches using Timing Information | Ioannis Constantinou et.al. | 2601.13804 | null |
| 2026-01-19 | Quasinormal modes and their excitation beyond general relativity. II: isospectrality loss in gravitational waveforms | Hector O. Silva et.al. | 2601.13411 | null |
| 2026-01-19 | The Words That Can’t Be Shared: Exploring the Design of Unsent Messages | Michael Yin et.al. | 2601.13343 | null |
| 2026-01-19 | Time variations of the mean magnetic flux in active regions of different magneto-morphological classes | Anastasiya Zhukova et.al. | 2601.13168 | null |
| 2026-01-18 | SplittingSecrets: A Compiler-Based Defense for Preventing Data Memory-Dependent Prefetcher Side-Channels | Reshabh K Sharma et.al. | 2601.12270 | null |
| 2026-01-18 | Speculative Sampling with Reinforcement Learning | Chenan Wang et.al. | 2601.12212 | null |
| 2026-01-17 | A Dynamo Confinement Scenario for the Solar Tachocline and its Implications for Spin-down in the Radiative Spreading Regime | Loren I. Matilsky et.al. | 2601.11943 | null |
| 2026-01-16 | On Abnormal Execution Timing of Conditional Jump Instructions | Annika Wilde et.al. | 2601.11696 | null |
| 2026-01-15 | WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching | Xiangchen Li et.al. | 2601.11652 | null |
| 2026-01-16 | Spectral evolution of hot hybrid white dwarfs: II. Photometry | Semih Filiz et.al. | 2601.11191 | null |
| 2026-01-16 | Coexisting electronic smectic liquid crystal and superconductivity in a Si square-net semimetal | Christopher J. Butler et.al. | 2601.10939 | null |
| 2026-01-14 | Annealed Relaxation of Speculative Decoding for Faster Autoregressive Image Generation | Xingyao Li et.al. | 2601.09212 | null |
| 2026-01-14 | SRT: Accelerating Reinforcement Learning via Speculative Rollout with Tree-Structured Cache | Chi-Chih Chang et.al. | 2601.09083 | null |
| 2026-01-13 | HIPPO: Accelerating Video Large Language Models Inference via Holistic-aware Parallel Speculative Decoding | Qitan Lv et.al. | 2601.08273 | null |
| 2026-01-12 | Spacetime Quasicrystals | Latham Boyle et.al. | 2601.07769 | null |
| 2026-01-12 | Crypto Pricing with Hidden Factors | Matthew Brigida et.al. | 2601.07664 | null |
| 2026-01-12 | TALON: Confidence-Aware Speculative Decoding with Adaptive Token Trees | Tianyu Liu et.al. | 2601.07353 | null |
| 2026-01-11 | The AI Cognitive Trojan Horse: How Large Language Models May Bypass Human Epistemic Vigilance | Andrew D. Maynard et.al. | 2601.07085 | null |
| 2026-01-14 | A binary merger product as the direct progenitor of a Type II-P supernova | Zexi Niu et.al. | 2601.06577 | null |
| 2026-01-14 | VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Commit | Junda Lin et.al. | 2601.05755 | null |
| 2026-01-09 | Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding | Yuxuan Zhou et.al. | 2601.05724 | null |
| 2026-01-09 | Double: Breaking the Acceleration Limit via Double Retrieval Speculative Parallelism | Yuhao Shen et.al. | 2601.05524 | null |
| 2026-01-08 | Multi-Scale Local Speculative Decoding for Image Generation | Elia Peruzzo et.al. | 2601.05149 | null |
| 2026-01-08 | Revisiting Judge Decoding from First Principles via Training-Free Distributional Divergence | Shengyin Sun et.al. | 2601.04766 | null |
| 2026-01-08 | The UnScripted Trip: Fostering Policy Discussion on Future Human-Vehicle Collaboration in Autonomous Driving Through Design-Oriented Methods | Xinyan Yu et.al. | 2601.04601 | null |
| 2026-01-06 | Revisiting Speculative Leaderless Protocols for Low-Latency BFT Replication | Daniel Qian et.al. | 2601.03390 | null |
| 2026-01-06 | On the Hilbert-Chow crepant resolution conjecture | Denis Nesterov et.al. | 2601.03036 | null |
| 2026-01-08 | MiMo-V2-Flash Technical Report | Xiaomi LLM-Core Team et.al. | 2601.02780 | null |
| 2026-01-06 | Experience and Adaptation in AI-mediated Hiring Systems: A Combined Analysis of Online Discourse and Interface Design | Md Nazmus Sakib et.al. | 2601.02775 | null |
| 2026-01-06 | From Slaves to Synths? Superintelligence and the Evolution of Legal Personality | Simon Chesterman et.al. | 2601.02773 | null |
| 2026-01-06 | Hypothesize-Then-Verify: Speculative Root Cause Analysis for Microservices with Pathwise Parallelism | Lingzhe Zhang et.al. | 2601.02736 | null |
| 2026-01-05 | A modern perspective on Tutte’s homotopy theorem | Matthew Baker et.al. | 2601.02582 | null |
| 2026-01-06 | The Betelgeuse Enigma: The Betelbuddy Hypothesis | Priya Hasan et.al. | 2601.02012 | null |
| 2026-01-07 | FastV-RAG: Towards Fast and Fine-Grained Video QA with Retrieval-Augmented Generation | Gen Li et.al. | 2601.01513 | null |
| 2026-01-02 | FlexSpec: Frozen Drafts Meet Evolving Targets in Edge-Cloud Collaborative LLM Speculative Decoding | Yuchen Li et.al. | 2601.00644 | null |
| 2026-01-01 | MR-DAW: Towards Collaborative Digital Audio Workstations in Mixed Reality | Torin Hopkins et.al. | 2601.00326 | null |
| 2025-12-31 | The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition | Xiaoze Liu et.al. | 2601.00065 | null |
| 2025-12-29 | From Clay to Code: Typological and Material Reasoning in AI Interpretations of Iranian Pigeon Towers | Abolhassan Pishahang et.al. | 2601.00029 | null |
| 2025-12-31 | Intriguing Magnetocaloric Effect in 6H-perovskite Ba3RRu2O9 (R=Ho, Gd, Tb, Nd) with Strong 4d-4f Correlations | Mohit Kumar et.al. | 2512.24758 | null |
| 2025-12-29 | Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding | Yue Guan et.al. | 2512.23858 | null |
| 2025-12-29 | Entropy-Aware Speculative Decoding Toward Improved LLM Reasoning | Tiancheng Su et.al. | 2512.23765 | null |
| 2025-12-27 | Landauer cost in a continuous vacuum/no-vacuum measurement | Lorenzo Pirovano et.al. | 2512.23751 | null |
| 2025-12-29 | Soft Robotic Technological Probe for Speculative Fashion Futures | Amy Ingold et.al. | 2512.23570 | null |
| 2025-12-29 | Fuzzilicon: A Post-Silicon Microcode-Guided x86 CPU Fuzzer | Johannes Lenzen et.al. | 2512.23438 | null |
| 2025-12-28 | An Architecture-Led Hybrid Report on Body Language Detection Project | Thomson Tong et.al. | 2512.23028 | null |
| 2026-01-05 | AFA-LoRA: Enabling Non-Linear Adaptations in LoRA with Activation Function Annealing | Jiacheng Li et.al. | 2512.22455 | null |
| 2025-12-27 | Nightjar: Dynamic Adaptive Speculative Decoding for Large Language Models Serving | Rui Li et.al. | 2512.22420 | null |
| 2025-12-26 | Eliminate Branches by Melding IR Instructions | Yuze Li et.al. | 2512.22390 | null |
| 2025-12-26 | Accelerate Speculative Decoding with Sparse Computation in Verification | Jikai Wang et.al. | 2512.21911 | null |
| 2025-12-26 | Fast Inference of Visual Autoregressive Model with Adjacency-Adaptive Dynamical Draft Trees | Haodong Lei et.al. | 2512.21857 | null |
| 2025-12-24 | dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning | Shirui Chen et.al. | 2512.21446 | null |
| 2025-12-24 | Parallel Token Prediction for Language Models | Felix Draxler et.al. | 2512.21323 | null |
| 2025-12-24 | Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning | Shengguang Wu et.al. | 2512.20934 | null |
| 2025-12-23 | Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs | Rui Pan et.al. | 2512.20573 | null |
| 2025-12-23 | DecoKAN: Interpretable Decomposition for Forecasting Cryptocurrency Market Dynamics | Yuan Gao et.al. | 2512.20028 | null |
| 2025-12-22 | Multimodal LLMs for Historical Dataset Construction from Archival Image Scans: German Patents (1877-1918) | Niclas Griesshaber et.al. | 2512.19675 | null |
| 2025-12-20 | Towards Efficient Agents: A Co-Design of Inference Architecture and System | Weizhe Lin et.al. | 2512.18337 | null |
| 2025-12-19 | Digital Bricolage: Design Speculations for Embodied Approaches to Digitized Print-based Cultural Collections | Malak Sadek et.al. | 2512.17590 | null |
| 2025-12-19 | Accelerating Multi-modal LLM Gaming Performance via Input Prediction and Mishit Correction | Ziyang Lin et.al. | 2512.17250 | null |
| 2025-12-18 | Machines, AI and the past//future of things | Karola Köpferl et.al. | 2512.16285 | null |
| 2025-12-18 | Fast Collaborative Inference via Distributed Speculative Decoding | Ce Zheng et.al. | 2512.16273 | null |
| 2025-12-17 | Optimizing Agentic Language Model Inference via Speculative Tool Calls | Daniel Nichols et.al. | 2512.15834 | null |
| 2025-12-14 | Variable Record Table: A Unified Hardware-Assisted Framework for Runtime Security | Suraj Kumar Sah et.al. | 2512.15777 | null |
| 2025-12-13 | TS-DP: Reinforcement Speculative Decoding For Temporal Adaptive Diffusion Policy Acceleration | Ye Li et.al. | 2512.15773 | null |
| 2025-12-17 | Probing the dynamics of stringy flux tubes with large $R$ -charge | Davide Bonomi et.al. | 2512.15698 | null |
| 2025-12-17 | The longest known tails of ram-pressure stripped star-forming galaxies are caused by an ICM shock in Abell 1367 | H. W. Edler et.al. | 2512.15660 | null |
| 2025-12-17 | DEER: Draft with Diffusion, Verify with Autoregressive Models | Zicong Cheng et.al. | 2512.15176 | null |
| 2025-12-16 | Steering Alternative Realities through Local Quantum Memory Operations | Xiongfeng Ma et.al. | 2512.14377 | null |
| 2025-12-16 | PADE: A Predictor-Free Sparse Attention Accelerator via Unified Execution and Stage Fusion | Huizheng Wang et.al. | 2512.14322 | null |
| 2025-12-16 | The Impact Market to Save Conference Peer Review: Decoupling Dissemination and Credentialing | Karthikeyan Sankaralingam et.al. | 2512.14104 | null |
| 2025-12-16 | RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees | Junjie Ma et.al. | 2512.14069 | null |
| 2025-12-17 | Efficient Adaptive Rejection Sampling for Accelerating Speculative Decoding in Large Language Models | Chendong Sun et.al. | 2512.13194 | null |
| 2025-12-14 | Spectral Theory of Almost Periodic Banach–Malcev Algebras and Applications to Moufang Dynamics | Marwa Ennaceur et.al. | 2512.12687 | null |
| 2025-12-16 | Mage: Cracking Elliptic Curve Cryptography with Cross-Axis Transformers | Lily Erickson et.al. | 2512.12483 | null |
| 2025-12-13 | Moduli stacks of quiver connections and non-Abelian Hodge theory | Mahmud Azam et.al. | 2512.12188 | null |
| 2025-12-13 | Binarity at LOw Metallicity (BLOeM): Projected rotational velocities | D. J. Lennon et.al. | 2512.12102 | null |
| 2025-12-12 | Universal Dynamics of Financial Bubbles in Isolated Markets: Evidence from the Iranian Stock Market | Ali Hosseinzadeh et.al. | 2512.12054 | null |
| 2025-12-11 | CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving | Dong Liu et.al. | 2512.11920 | null |
| 2025-12-12 | Speculative Decoding Speed-of-Light: Optimal Lower Bounds via Branching Random Walks | Sergey Pankratov et.al. | 2512.11718 | null |
| 2025-12-12 | AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference | Kuan-Wei Lu et.al. | 2512.11280 | null |
| 2025-12-12 | FutureWeaver: Planning Test-Time Compute for Multi-Agent Systems with Modularized Collaboration | Dongwon Jung et.al. | 2512.11213 | null |
| 2025-12-11 | Site Preference and Possible Coexistence of Antiferromagnetic Order and Magnetic Frustration in (Co1-xMgx)10Ge3O16 (0 <= x <= 30%) | Gina Angelo et.al. | 2512.11132 | null |
| 2025-12-11 | Mixing by offshore wind infrastructure: Resolving the density stratified wakes past vertical cylinders | Charlie J. Lloyd et.al. | 2512.10751 | null |
| 2025-12-11 | T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground | Dmitrii Stoianov et.al. | 2512.10430 | null |
| 2025-12-11 | Motifs in self-organising cells | Ying Chen Lim et.al. | 2512.10307 | null |
| 2025-12-10 | Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning | Logan Robbins et.al. | 2512.10054 | null |
| 2025-12-14 | GoodSpeed: Optimizing Fair Goodput with Adaptive Speculative Decoding in Distributed Edge Inference | Phuong Tran et.al. | 2512.09963 | null |
| 2025-12-10 | A Speculative GLRT-Backed Approach for Adversarial Resilience on Deep Learning-Based Array Processing | Nian-Cin Wang et.al. | 2512.09893 | null |
| 2025-12-10 | Baseline: Operation-Based Evolution and Versioning of Data | Jonathan Edwards et.al. | 2512.09762 | null |
| 2025-12-10 | WASP-12, shrouded in mystery or just cold gas? | Simon Daley-Yates et.al. | 2512.09593 | null |
| 2025-12-09 | Fast-ARDiff: An Entropy-informed Acceleration Framework for Continuous Space Autoregressive Generation | Zhen Zou et.al. | 2512.08537 | null |
| 2025-12-08 | Fair Benchmarking of Optimisation Applications | Frank Phillipson et.al. | 2512.07915 | null |
| 2025-11-30 | The Endogenous Constraint: Hysteresis, Stagflation, and the Structural Inhibition of Monetary Velocity in the Bitcoin Network (2016-2025) | Hamoon Soleimani et.al. | 2512.07886 | null |
| 2025-12-08 | Chemical complexity in star formation induced by stellar feedback: cores shock-formed by the supernova remnant W44 | G. Cosentino et.al. | 2512.07562 | null |
| 2025-12-08 | SJD++: Improved Speculative Jacobi Decoding for Training-free Acceleration of Discrete Auto-regressive Text-to-Image Generation | Yao Teng et.al. | 2512.07503 | null |
| 2025-12-06 | BitStopper: An Efficient Transformer Attention Accelerator via Stage-fusion and Early Termination | Huizheng Wang et.al. | 2512.06457 | null |
| 2025-12-05 | Protocol Futuring: Speculating Second-Order Dynamics of Protocols in Sociotechnical Infrastructural Futures | Botao ‘Amber’ Hu et.al. | 2512.06108 | null |
| 2025-12-05 | Fast SceneScript: Accurate and Efficient Structured Language Model via Multi-Token Prediction | Ruihong Yin et.al. | 2512.05597 | null |
| 2025-12-09 | Arbitrage: Efficient Reasoning via Advantage-Aware Speculation | Monishwaran Maheswaran et.al. | 2512.05033 | null |
| 2025-12-04 | Long-term X-ray variability of the multiple-planet host L 98-59: Hints of an activity cycle | I. Pillitteri et.al. | 2512.04817 | null |
| 2025-12-04 | RLHFSpec: Breaking the Efficiency Bottleneck in RLHF Training via Adaptive Drafting | Siqi Wang et.al. | 2512.04752 | null |
| 2025-12-03 | Counting AdS Vacua | Zihni Kaan Baykara et.al. | 2512.04151 | null |
| 2025-12-01 | Humanity in the Age of AI: Reassessing 2025’s Existential-Risk Narratives | Mohamed El Louadi et.al. | 2512.04119 | null |
| 2025-12-02 | From Administrative Chaos to Analytical Cohorts: A Three-Stage Normalisation Pipeline for Longitudinal University Administrative Records | H. R. Paz et.al. | 2512.02936 | null |
| 2025-12-02 | A Human-centric Framework for Debating the Ethics of AI Consciousness Under Uncertainty | Zhou Ziheng et.al. | 2512.02544 | null |
| 2025-12-02 | SpecPV: Improving Self-Speculative Decoding for Long-Context Generation via Partial Verification | Zhendong Tan et.al. | 2512.02337 | null |
| 2025-12-05 | Much Ado About Noising: Dispelling the Myths of Generative Robotic Control | Chaoyi Pan et.al. | 2512.01809 | null |
| 2025-12-01 | Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding | Yilong Zhao et.al. | 2512.01278 | null |
| 2025-11-30 | Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding | Pengfei Hu et.al. | 2512.00805 | null |
| 2025-11-30 | SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs | Jiaming Xu et.al. | 2512.00722 | null |
| 2025-11-30 | SIMPLE: Disaggregating Sampling from GPU Inference into a Decision Plane for Faster Distributed LLM Serving | Bohan Zhao et.al. | 2512.00719 | null |
| 2025-11-29 | Speculating on the Role of Media Architecture in Post-disaster Rebuilding and Recovery: Insights from Architects and Interaction Designers | Berk Goksenin Tan et.al. | 2512.00537 | null |
| 2025-11-29 | Measuring Memecoin Fragility | Yuexin Xiang et.al. | 2512.00377 | null |
| 2025-12-04 | Retail Investor Horizon and Earnings Announcements | Domonkos F. Vamossy et.al. | 2512.00280 | null |
| 2025-12-05 | Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match | Jinze Li et.al. | 2511.22972 | null |
| 2025-12-03 | AI Deception: Risks, Dynamics, and Controls | Boyuan Chen et.al. | 2511.22619 | null |
| 2025-11-27 | LLM-Cave: A benchmark and light environment for large language models reasoning and decision-making system | Huanyu Li et.al. | 2511.22598 | null |
| 2025-11-26 | Dark Speculation: Combining Qualitative and Quantitative Understanding in Frontier AI Risk Analysis | Daniel Carpenter et.al. | 2511.21838 | null |
| 2025-11-26 | Nuclear Detonations as Probes of Hidden Superluminal Sectors | Karl Svozil et.al. | 2511.21793 | null |
| 2025-11-25 | The dynamic of a tax on land value : concepts, models and impact scenario | Hugo Spring-Ragain et.al. | 2511.21766 | null |
| 2025-11-24 | Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models | Linye Wei et.al. | 2511.21759 | null |
| 2025-12-01 | DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving | Fengze Yu et.al. | 2511.21669 | null |
| 2025-11-26 | Weak gravity at micron scales from dark bubble cosmology and its cosmological consequences | Ulf Danielsson et.al. | 2511.21362 | null |
| 2025-11-25 | FREE: Uncertainty-Aware Autoregression for Parallel Diffusion Transformers | Xinwan Wen et.al. | 2511.20390 | null |
| 2025-11-25 | Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios | Luohe Shi et.al. | 2511.20340 | null |
| 2025-11-25 | Adaptive LLM Agents: Toward Personalized Empathetic Care | Priyanka Singh et.al. | 2511.20080 | null |
| 2025-11-25 | Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design | Zixiao Huang et.al. | 2511.20048 | null |
| 2025-11-24 | Agint: Agentic Graph Compilation for Software Engineering Agents | Abhi Chivukula et.al. | 2511.19635 | null |
| 2025-11-24 | AI Consciousness and Existential Risk | Rufin VanRullen et.al. | 2511.19115 | null |
| 2025-11-24 | NEZHA: A Zero-sacrifice and Hyperspeed Decoding Architecture for Generative Recommendations | Yejing Wang et.al. | 2511.18793 | null |
| 2025-11-22 | Accelerating Time Series Foundation Models with Speculative Decoding | Pranav Subbaraman et.al. | 2511.18191 | null |
| 2025-11-22 | Revisiting $γ$ -Ray Orbital Modulation in the Redback Millisecond Pulsar PSR J2039-5617 | Mengqing Zhang et.al. | 2511.17900 | null |
| 2025-11-21 | Broadband X-ray observations of the periodic optical source ZTF J185139.81+171430.3 and its identification as a massive intermediate polar | Ren Deng et.al. | 2511.17800 | null |
| 2025-11-21 | Pre-cache: A Microarchitectural Solution to prevent Meltdown and Spectre | Subhash Sethumurugan et.al. | 2511.17726 | null |
| 2025-11-21 | Which active galaxies might be neutrino emitters? | Shuying Zhou et.al. | 2511.16869 | null |
| 2025-11-20 | Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter | Qinghao Hu et.al. | 2511.16665 | null |
| 2025-11-20 | An observationally based wind model contemporaneous with the radio detections in $τ$ Boötis | Dag Evensberget et.al. | 2511.16370 | null |
| 2025-11-21 | Fast LLM Post-training via Decoupled and Best-of-N Speculation | Rongxin Cheng et.al. | 2511.16193 | null |
| 2025-11-20 | Can Online GenAI Discussion Serve as Bellwether for Labor Market Shifts? | Shurui Cao et.al. | 2511.16028 | null |
| 2025-11-19 | Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Minimization | Rahul Krishna Thomas et.al. | 2511.15898 | null |
| 2025-11-19 | Fossil group origins XIV: The radial orbits of A267 | S. Zarattini et.al. | 2511.15786 | null |
| 2025-11-19 | FlashMesh: Faster and Better Autoregressive Mesh Synthesis via Structured Speculation | Tingrui Shen et.al. | 2511.15618 | null |
| 2025-11-24 | Structural phase transitions in the van der Waals ferromagnets Fe $x$Pd${y}$Te$_2$ | Rafaela F. S. Penacchio et.al. | 2511.15584 | null |
| 2025-11-19 | Cost-Aware Prediction (CAP): An LLM-Enhanced Machine Learning Pipeline and Decision Support System for Heart Failure Mortality Prediction | Yinan Yu et.al. | 2511.15357 | null |
| 2025-11-19 | Gaussian Blending: Rethinking Alpha Blending in 3D Gaussian Splatting | Junseo Koo et.al. | 2511.15102 | null |
| 2025-11-18 | Harmful Traits of AI Companions | W. Bradley Knox et.al. | 2511.14972 | null |
| 2025-11-18 | Photometric Constraints on Intermediate-mass Black Holes in the Galactic Centre | Tamojeet Roychowdhury et.al. | 2511.14856 | null |
| 2025-11-23 | Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning | Ruoyu Qin et.al. | 2511.14617 | null |
| 2025-11-18 | Positive AGN feedback in the outskirts of nearby barred spiral galaxies? | Bannanje Ananthamoorthy et.al. | 2511.14257 | null |
| 2025-11-18 | Enhanced UV emission knot in the giant radio galaxy NGC 315: Hint of patchy star formation? | Bannanje Ananthamoorthy et.al. | 2511.14252 | null |
| 2025-11-18 | MoE-SpeQ: Speculative Quantized Decoding with Proactive Expert Prefetching and Offloading for Mixture-of-Experts | Wenfeng Wang et.al. | 2511.14102 | null |
| 2025-11-17 | Beat the long tail: Distribution-Aware Speculative Decoding for RL Training | Zelei Shao et.al. | 2511.13841 | null |
| 2025-11-17 | VVS: Accelerating Speculative Decoding for Visual Autoregressive Generation via Partial Verification Skipping | Haotian Dong et.al. | 2511.13587 | null |
| 2025-11-17 | Tfin Crypto: From Speculation to Optimization in Risk Managed Crypto Portfolio Allocation | Thanh Nguyen et.al. | 2511.13239 | null |
| 2025-11-15 | Striking the Right Balance between Compute and Copy: Improving LLM Inferencing Under Speculative Decoding | Arun Ramachandran et.al. | 2511.12031 | null |
| 2025-11-15 | Educators on the Frontline: Philosophical and Realistic Perspectives on Integrating ChatGPT into the Learning Space | Surajit Das et.al. | 2511.11960 | null |
| 2025-11-13 | Speculative Decoding in Decentralized LLM Inference: Turning Communication Latency into Computation Throughput | Jingwei Song et.al. | 2511.11733 | null |
| 2025-11-09 | Exploring Parallelism in FPGA-Based Accelerators for Machine Learning Applications | Sed Centeno et.al. | 2511.11640 | null |
| 2025-11-14 | Fast and Expressive Multi-Token Prediction with Probabilistic Circuits | Andreas Grivas et.al. | 2511.11346 | null |
| 2025-11-14 | Optimising Density Computations in Probabilistic Programs via Automatic Loop Vectorisation | Sangho Lim et.al. | 2511.11070 | null |
| 2025-11-13 | Widening of Binaries via Non-conservative Mass Transfer as a Formation Channel for Gaia Black Hole System | Aleksandra Olejak et.al. | 2511.10728 | null |
| 2025-11-12 | Evaluating from Benign to Dynamic Adversarial: A Squid Game for Large Language Models | Zijian Chen et.al. | 2511.10691 | null |
| 2025-11-08 | A Mathematical Framework for AI Singularity: Conditions, Bounds, and Control of Recursive Improvement | Akbar Anbar Jafari et.al. | 2511.10668 | null |
| 2025-11-13 | Steering Pretrained Drafters during Speculative Decoding | Frédéric Berdoz et.al. | 2511.09844 | null |
| 2025-11-12 | Emergent Dark Matter | Christian Canete et.al. | 2511.09034 | null |
| 2025-11-12 | TiDAR: Think in Diffusion, Talk in Autoregression | Jingyu Liu et.al. | 2511.08923 | null |
| 2025-11-14 | Kinematic scaling relations of disc galaxies from ionised gas at $z\sim~1$ and their connection with dark matter halos | Pavel E. Mancera Piña et.al. | 2511.08685 | null |
| 2025-11-11 | Parallel Sampling via Autospeculation | Nima Anari et.al. | 2511.07869 | null |
| 2025-11-11 | Critical Confabulation: Can LLMs Hallucinate for Social Good? | Peiqi Sui et.al. | 2511.07722 | null |
| 2025-11-10 | Look into your Heart – Prototypes for a Speculative Design Exploration of Personal Heart Rate Visualization | Swaroop Panda et.al. | 2511.07600 | null |
| 2025-11-08 | In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading | Shuning Lin et.al. | 2511.05814 | null |
| 2025-11-06 | The TeV emission of 3C273: inverse Compton radiation from shear-accelerated high-energy electrons in the large-scale jet? | F. Tavecchio et.al. | 2511.04433 | null |
| 2025-11-03 | TapOut: A Bandit-Based Approach to Dynamic Speculative Decoding | Aditya Sridhar et.al. | 2511.02017 | null |
| 2025-11-04 | Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding | Jungyeon Koh et.al. | 2511.01695 | null |
| 2025-11-03 | When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding | Min Fang et.al. | 2511.01282 | null |
| 2025-11-04 | SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding | Jameson Sandler et.al. | 2511.00606 | null |
| 2025-11-01 | Reject Only Critical Tokens: Pivot-Aware Speculative Decoding | Amir Ziashahabi et.al. | 2511.00351 | null |
| 2025-11-01 | Sherlock: Reliable and Efficient Agentic Workflow Execution | Yeonju Ro et.al. | 2511.00330 | null |
| 2025-10-31 | SpecAttn: Speculating Sparse Attention | Harsh Shah et.al. | 2510.27641 | null |
| 2025-10-30 | Kad: A Framework for Proxy-based Test-time Alignment with Knapsack Approximation Deferral | Ayoub Hammal et.al. | 2510.27017 | null |
| 2025-10-30 | CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs | Zhiyuan Ning et.al. | 2510.26843 | null |
| 2025-10-30 | Inference-Cost-Aware Dynamic Tree Construction for Efficient Inference in Large Language Models | Yinrong Hong et.al. | 2510.26577 | null |
| 2025-10-30 | Polybasic Speculative Decoding Through a Theoretical Perspective | Ruilin Wang et.al. | 2510.26527 | null |
| 2025-10-30 | In space there will be no need to scream – Limits to the presence of giant planets in the $ζ^2$ Ret system | A. Suárez Mascareño et.al. | 2510.26483 | null |
| 2025-10-30 | ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems | Qiaoling Chen et.al. | 2510.26475 | null |
| 2025-10-29 | Foundations of Fiat-Denominated Loans Collateralized by Cryptocurrencies | Pavel Hubáček et.al. | 2510.25878 | null |
| 2025-10-29 | Hawk: Leveraging Spatial Context for Faster Autoregressive Text-to-Image Generation | Zhi-Kai Chen et.al. | 2510.25739 | null |
| 2025-10-29 | Accurate Leakage Speculation for Quantum Error Correction | Chaithanya Naik Mude et.al. | 2510.25661 | null |
| 2025-10-29 | Detuning Choice for solving MIS and MWIS | Sem Saada Khelkhal et.al. | 2510.25473 | null |
| 2025-10-31 | MMEdge: Accelerating On-device Multimodal Inference via Pipelined Sensing and Encoding | Runxi Huang et.al. | 2510.25327 | null |
| 2025-10-31 | ‘Studies for’: A Human-AI Co-Creative Sound Artwork Using a Real-time Multi-channel Sound Generation Model | Chihiro Nagashima et.al. | 2510.25228 | null |
| 2025-10-29 | Prospects for a fourth generation of leptons in a 13 TeV p-p collider | Ramkrishna Joshi et.al. | 2510.25190 | null |
| 2025-10-28 | On the Field Excursion Bound | Tom Rudelius et.al. | 2510.24715 | null |
| 2025-10-28 | MC-SJD : Maximal Coupling Speculative Jacobi Decoding for Autoregressive Visual Generation Acceleration | Junhyuk So et.al. | 2510.24211 | null |
| 2025-10-28 | SpecKD: Speculative Decoding for Effective Knowledge Distillation of LLMs | Haiduo Huang et.al. | 2510.24021 | null |
| 2025-10-27 | Financial markets as a Le Bonian crowd during boom-and-bust episodes: A complementary theoretical framework in behavioural finance | Claire Barraud et.al. | 2510.23175 | null |
| 2025-10-27 | Understanding In-Context Learning Beyond Transformers: An Investigation of State Space and Hybrid Architectures | Shenran Wang et.al. | 2510.23006 | null |
| 2025-10-27 | Exploring Structures of Inferential Mechanisms through Simplistic Digital Circuits | Giovanni Sileno et.al. | 2510.22883 | null |
| 2025-10-26 | Batch Speculative Decoding Done Right | Ranran Haoran Zhang et.al. | 2510.22876 | null |
| 2025-10-26 | FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference | Divya Jyoti Bajpai et.al. | 2510.22641 | null |
| 2025-10-24 | Unravelling the oxygen influence in cubic bixbyite In $_2$O$_3$ on Raman active phonon modes by isotope studies | Johannes Feldl et.al. | 2510.22018 | null |
| 2025-10-24 | Butterfly: glo-cal effects of data, energy and industry, New Media and Performance Exhibition Catalogue | Rebekah Rousi et.al. | 2510.21893 | null |
| 2025-10-23 | Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation | Yuhan Liu et.al. | 2510.20812 | null |
| 2025-10-22 | Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs | Hongyi Liu et.al. | 2510.20064 | null |
| 2025-10-22 | Speculative Sampling for Parametric Temporal Point Processes | Marin Biloš et.al. | 2510.20031 | null |
| 2025-10-22 | New Recursions for the Canonical Scalar-Scaffolded Yang-Mills Amplitude | Jeffrey V. Backus et.al. | 2510.19901 | null |
| 2025-10-22 | AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders | Yuezhou Hu et.al. | 2510.19779 | null |
| 2025-10-23 | Fast Inference via Hierarchical Speculative Decoding | Clara Mohri et.al. | 2510.19705 | null |
| 2025-10-22 | CoSense-LLM: Semantics at the Edge with Cost- and Uncertainty-Aware Cloud-Edge Cooperation | Hasan Akgul et.al. | 2510.19670 | null |
| 2025-10-22 | Fermionic fields of higher spin in de Sitter space | Dionysios Anninos et.al. | 2510.19652 | null |
| 2025-10-21 | Reasoning Language Model Inference Serving Unveiled: An Empirical Study | Qi Li et.al. | 2510.18672 | null |
| 2025-10-21 | From Quarter to All: Accelerating Speculative LLM Decoding via Floating-Point Exponent Remapping and Parameter Sharing | Yushu Zhao et.al. | 2510.18525 | null |
| 2025-10-20 | Annotating the Chain-of-Thought: A Behavior-Labeled Dataset for AI Safety | Antonio-Gabriel Chacón Menke et.al. | 2510.18154 | null |
| 2025-10-20 | A Hall viscosity for skyrmion via magnon interaction | Bom Soo Kim et.al. | 2510.18092 | null |
| 2025-10-20 | SpecAgent: A Speculative Retrieval and Forecasting Agent for Code Completion | George Ma et.al. | 2510.17925 | null |
| 2025-10-18 | Does GenAI Rewrite How We Write? An Empirical Study on Two-Million Preprints | Minfeng Qi et.al. | 2510.17882 | null |
| 2025-10-18 | $ρ$ Hammer: Reviving RowHammer Attacks on New Architectures via Prefetching | Weijie Chen et.al. | 2510.16544 | null |
| 2025-10-18 | What Limits Agentic Systems Efficiency? | Song Bian et.al. | 2510.16276 | null |
| 2025-10-17 | Interpretable RNA-Seq Clustering with an LLM-Based Agentic Evidence-Grounded Framework | Elias Hossain et.al. | 2510.16082 | null |
| 2025-10-29 | TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs | Sibo Xiao et.al. | 2510.15545 | null |
| 2025-10-23 | Accelerating Mobile Language Model via Speculative Decoding and NPU-Coordinated Execution | Zhiyang Chen et.al. | 2510.15312 | null |
| 2025-10-16 | Speculative Model Risk in Healthcare AI: Using Storytelling to Surface Unintended Harms | Xingmeng Zhao et.al. | 2510.14718 | null |
| 2025-10-16 | xLLM Technical Report | Tongxuan Liu et.al. | 2510.14686 | null |
| 2025-10-15 | Cortex: Workflow-Aware Resource Pooling and Scheduling for Agentic Serving | Nikos Pagonas et.al. | 2510.14126 | null |
| 2025-10-15 | Tests of restricted Quantum Focusing and a universal CFT bound | Victor Franken et.al. | 2510.13961 | null |
| 2025-10-17 | What Layers When: Learning to Skip Compute in LLMs with Residual Gates | Filipe Laitenberger et.al. | 2510.13876 | null |
| 2025-10-15 | Are Randomized Quantum Linear Systems Solvers Practical? | Siddharth Hariprakash et.al. | 2510.13766 | null |
| 2025-10-15 | Speculating a Tactile Grammar: Toward Task-Aligned Chart Design for Non-Visual Perception | Areen Khalaila et.al. | 2510.13731 | null |
| 2025-10-15 | Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference | Nikhil Bhendawade et.al. | 2510.13161 | null |
| 2025-10-14 | 3-Model Speculative Decoding | Sanghyun Byun et.al. | 2510.12966 | null |
| 2025-10-14 | Language Models Model Language | Łukasz Borchmann et.al. | 2510.12766 | null |
| 2025-10-14 | Notes on false vacuum decay in quantum Ising models | Ian G. Moss et.al. | 2510.12592 | null |
| 2025-10-14 | A Direct Memory Access Controller (DMAC) for Irregular Data Transfers on RISC-V Linux Systems | Thomas Benz et.al. | 2510.12277 | null |
| 2025-10-14 | How Far I’ll Go: Imagining Futures of Conversational AI with People with Visual Impairments Through Design Fiction | Jeanne Choi et.al. | 2510.12268 | null |
| 2025-10-13 | Direct Multi-Token Decoding | Xuan Luo et.al. | 2510.11958 | null |
| 2025-10-13 | New Tests of Low-Scale Quantum Gravity with Cosmic-Ray Collisions | Manuel Ettengruber et.al. | 2510.11879 | null |
| 2025-10-13 | General real-valued theories with the Schröder-Bernstein property are stable | Alexander Berenstein et.al. | 2510.11858 | null |
| 2025-10-13 | The Magic Barrier before Thermalization | Lukas Ebner et.al. | 2510.11681 | null |
| 2025-10-13 | (Dis)Proving Spectre Security with Speculation-Passing Style | Santiago Arranz-Olmos et.al. | 2510.11573 | null |
| 2025-10-14 | AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model | Zhiwei Jin et.al. | 2510.11496 | null |
| 2025-10-13 | Efficient LLM Inference over Heterogeneous Edge Networks with Speculative Decoding | Bingjie Zhu et.al. | 2510.11331 | null |
| 2025-10-11 | SP-MoE: Speculative Decoding and Prefetching for Accelerating MoE-based Model Inference | Liangkun Chen et.al. | 2510.10302 | null |
| 2025-10-11 | Exploration of Embodied Space Experience through Umbilical Interaction: A Grounded Theory Approach | Shuai Guo et.al. | 2510.10258 | null |
| 2025-10-11 | LAMOST J064137.77+045743.8: A New Binary of an A7-type Pulsating Subgiant and an M-type Red Dwarf | Yanhui Chen et.al. | 2510.10164 | null |
| 2025-10-11 | Conformal Sparsification for Bandwidth-Efficient Edge-Cloud Speculative Decoding | Payel Bhattacharjee et.al. | 2510.09942 | null |
| 2025-10-10 | Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy | Xiaoxiao Ma et.al. | 2510.09012 | null |
| 2025-10-10 | Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation | Yao Teng et.al. | 2510.08994 | null |
| 2025-10-10 | Mozart: A Chiplet Ecosystem-Accelerator Codesign Framework for Composable Bespoke Application Specific Integrated Circuits | Haoran Jin et.al. | 2510.08873 | null |
| 2025-10-09 | Atomically resolved electron reflectivity at a metal/semiconductor interface | Ding-Ming Huang et.al. | 2510.07970 | null |
| 2025-10-08 | OWL: Overcoming Window Length-Dependence in Speculative Decoding for Long-Context Inputs | Jaeseong Lee et.al. | 2510.07535 | null |
| 2025-10-08 | Lectures on entanglement, von Neumann algebras, and emergence of spacetime | Hong Liu et.al. | 2510.07017 | null |
| 2025-10-08 | Simulations of Globular Cluster Evolution with Multiple Stellar Populations | Mirek Giersz et.al. | 2510.06942 | null |
| 2025-10-07 | A Meat-Summer Night’s Dream: A Tangible Design Fiction Exploration of Eating Biohybrid Flying Robots | Ziming Wang et.al. | 2510.06507 | null |
| 2025-10-07 | Back to the Future Museum – Speculative Design for Virtual Citizen-Curated Museums | Richard Rhodes et.al. | 2510.06472 | null |
| 2025-10-06 | Draft, Verify, and Improve: Toward Training-Aware Speculative Decoding | Shrenik Bhansali et.al. | 2510.05421 | null |
| 2025-10-06 | Zigzags and free adjunctions | Lorenzo Riva et.al. | 2510.05371 | null |
| 2025-10-06 | Gromov-Witten theory, degenerations, and the tautological ring | Davesh Maulik et.al. | 2510.04779 | null |
| 2025-10-05 | Speculative Actions: A Lossless Framework for Faster Agentic Systems | Naimeng Ye et.al. | 2510.04371 | null |
| 2025-10-05 | Self Speculative Decoding for Diffusion Large Language Models | Yifeng Gao et.al. | 2510.04147 | null |
| 2025-10-04 | Self-Speculative Masked Diffusions | Andrew Campbell et.al. | 2510.03929 | null |
| 2025-10-04 | Security Analysis of Ponzi Schemes in Ethereum Smart Contracts | Chunyi Zhang et.al. | 2510.03819 | null |
| 2025-10-03 | PrivacyMotiv: Speculative Persona Journeys for Empathic and Motivating Privacy Reviews in UX Design | Zeya Chen et.al. | 2510.03559 | null |
| 2025-10-03 | Action Deviation-Aware Inference for Low-Latency Wireless Robots | Jeyoung Park et.al. | 2510.02851 | null |
| 2025-10-03 | A Concept of Possibility for Real-World Events | Daniel G. Schwartz et.al. | 2510.02655 | null |
| 2025-10-02 | Dispersion in Analogue Gravity | Eren Erberk Erkul et.al. | 2510.02542 | null |
| 2025-10-02 | Impact of AGN and nuclear star formation on the ISM turbulence of galaxies: Insights from JWST/MIRI spectroscopy | Rogemar A. Riffel et.al. | 2510.02517 | null |
| 2025-09-28 | DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding | Guanghao Li et.al. | 2510.02358 | null |
| 2025-10-02 | The Disparate Impacts of Speculative Decoding | Jameson Sandler et.al. | 2510.02128 | null |
| 2025-10-03 | Virtual fibring of manifolds and groups | Dawid Kielak et.al. | 2510.01805 | null |
| 2025-10-01 | Theory is Shapes | Matthew Varona et.al. | 2510.01382 | null |
| 2025-10-01 | HiSpec: Hierarchical Speculative Decoding for LLMs | Avinash Kumar et.al. | 2510.01336 | null |
| 2025-10-01 | Combining complex Langevin dynamics with score-based and energy-based diffusion models | Gert Aarts et.al. | 2510.01328 | null |
| 2025-09-30 | Chiral effects and Joule heating in hot and dense matter | Srimoyee Sen et.al. | 2510.00114 | null |
| 2025-09-29 | A(I)nimism: Re-enchanting the World Through AI-Mediated Object Interaction | Diana Mykhaylychenko et.al. | 2509.25558 | null |
| 2025-09-29 | The Stellar Content of NGC~3603 Revisited: Is the IMF Top Heavy? | Philip Massey et.al. | 2509.25099 | null |
| 2025-09-29 | Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding | Sungkyun Kim et.al. | 2509.24328 | null |
| 2025-09-29 | SpecExit: Accelerating Large Reasoning Model via Speculative Exit | Rubing Yang et.al. | 2509.24248 | null |
| 2025-09-28 | HiViS: Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models | Zhinan Xie et.al. | 2509.23928 | null |
| 2025-09-27 | SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts | Bingshuai Liu et.al. | 2509.23232 | null |
| 2025-09-29 | SAHM: State-Aware Heterogeneous Multicore for Single-Thread Performance | Shayne Wadle et.al. | 2509.22405 | null |
| 2025-09-26 | In Their Own Words: Reasoning Traces Tailored for Small Models Make Them Better Reasoners | Jaehoon Kim et.al. | 2509.22230 | null |
| 2025-09-26 | Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding | Shijing Hu et.al. | 2509.22134 | null |
| 2025-09-26 | FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning | Yizhou Zhang et.al. | 2509.21792 | null |
| 2025-09-26 | Self-Speculative Biased Decoding for Faster Live Translation | Linxiao Zeng et.al. | 2509.21740 | null |
| 2025-09-25 | SpecMER: Fast Protein Generation with K-mer Guided Speculative Decoding | Thomas Walton et.al. | 2509.21689 | null |
| 2025-09-25 | SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips | Xinyu Lian et.al. | 2509.21271 | null |
| 2025-09-24 | The interstellar heritage of comets | Karen Willacy et.al. | 2509.20530 | null |
| 2025-09-30 | Speculative Safety-Aware Decoding | Xuekang Wang et.al. | 2508.17739 | null |
| 2025-08-07 | Hierarchical Verification of Speculative Beams for Accelerating LLM Inference | Jaydip Sen et.al. | 2508.03726 | null |
| 2025-07-22 | Collaborative Inference and Learning between Edge SLMs and Cloud LLMs: A Survey of Algorithms, Execution, and Open Challenges | Senyao Li et.al. | 2507.16731 | null |
| 2025-07-22 | Enhancing Compiler Optimization Efficiency through Grammatical Decompositions of Control-Flow Graphs | Xuran Cai et.al. | 2507.16660 | null |
| 2025-07-22 | Ly $α$ Emission from [OIII] Emitters Near Reionization: The role of environment in galaxy Ly$α$ detection | Seyedazim Hashemi et.al. | 2507.16231 | null |
| 2025-07-20 | Designing Robots with, not for: A Co-Design Framework for Empowering Interactions in Forensic Psychiatry | Qiaoqiao Ren et.al. | 2507.14931 | null |
| 2025-07-18 | On the asymptotic equidistribution of word values in symmetric groups | Vadim Alekseev et.al. | 2507.13928 | null |
| 2025-07-22 | Gravity and the Higgs boson mass | Carlo Branchina et.al. | 2507.13832 | null |
| 2025-07-16 | Modeling Feasible Locomotion of Nanobots for Cancer Detection and Treatment | Noble Harasha et.al. | 2507.12400 | null |
| 2025-07-16 | Efficient Control Flow Attestation by Speculating on Control Flow Path Representations | Liam Tyler et.al. | 2507.12345 | null |
| 2025-07-17 | DSSD: Efficient Edge-Device LLM Deployment and Collaborative Inference via Distributed Split Speculative Decoding | Jiahong Ning et.al. | 2507.12000 | null |
| 2025-07-16 | Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential | Mohammad Samragh et.al. | 2507.11851 | null |
| 2025-07-16 | Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI | Samyam Rajbhandari et.al. | 2507.11830 | null |
| 2025-07-14 | Exploring ultra-high energy neutrino experiments through the lens of the transport equation | Stefano Palmisano et.al. | 2507.10665 | null |
| 2025-07-14 | Large Interconnected Thermodynamic Systems Nearly Minimize Entropy Production | Kyle J. Ray et.al. | 2507.10476 | null |
| 2025-07-14 | Supernova-induced binary-interaction-powered supernovae: a model for SN2022jli | Ryosuke Hirai et.al. | 2507.09974 | null |
| 2025-07-12 | TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding | Shukai Gong et.al. | 2507.09252 | null |
| 2025-07-21 | Bringing the Norma Dark Cloud to Light in X-rays | Stephen L. Skinner et.al. | 2507.09047 | null |
| 2025-07-11 | On Evaluating Performance of LLM Inference Serving Systems | Amey Agrawal et.al. | 2507.09019 | null |
| 2025-07-10 | Greening Schoolyards and the Spatial Distribution of Property Values in Denver, Colorado | Mahshid Gorjian et.al. | 2507.08894 | null |
| 2025-07-11 | BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity | Chenyang Song et.al. | 2507.08771 | null |
| 2025-07-11 | Time Variation in the TeV Cosmic Ray Anisotropy with IceCube and Energy Dependence of the Solar Dipole | Perri Zilberman et.al. | 2507.08242 | null |
| 2025-07-08 | Optically Overluminous Tidal Disruption Events: Outflow Properties and Implications for Extremely Relativistic Disruptions | Yuhan Yao et.al. | 2507.06453 | null |
| 2025-07-08 | Experiments to test the hypothesis for solar and dark matter axions | Babette Döbrich et.al. | 2507.06414 | null |
| 2025-07-08 | Supernovae from stellar mergers and accretors of binary mass transfer: Implications for Type IIP, 1987A-like and interacting supernovae | F. R. N. Schneider et.al. | 2507.06391 | null |
| 2025-07-08 | Bouncing Grains Keep Protoplanetary Disks Bright | Yansong Qian et.al. | 2507.06298 | null |
| 2025-07-08 | Tropical Donagi theorem | Felix Röhrle et.al. | 2507.05987 | null |
| 2025-07-04 | Impact of flavor condensate dark matter on accretion disk luminosity in spherical spacetimes | Antonio Capolupo et.al. | 2507.03758 | null |
| 2025-06-18 | Evolution, Future of AI, and Singularity | Zeki Doruk Erden et.al. | 2507.02876 | null |
| 2025-07-03 | NVIDIA GPU Confidential Computing Demystified | Zhongshu Gu et.al. | 2507.02770 | null |
| 2025-07-03 | OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding | Ramchalam Kinattinkara Ramakrishnan et.al. | 2507.02659 | null |
| 2025-07-03 | High-Order Deep Meta-Learning with Category-Theoretic Interpretation | David H. Mguni et.al. | 2507.02634 | null |
| 2025-07-14 | FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference | Xing Liu et.al. | 2507.02620 | null |
| 2025-07-02 | H.E.S.S. programme searching for VHE gamma rays associated with FRBs | F. Aharonian et.al. | 2507.02143 | null |
| 2025-07-07 | Handling out-of-order input arrival in CEP engines on the edge combining optimistic, pessimistic and lazy evaluation | Styliani Kyrama et.al. | 2507.01461 | null |
| 2025-07-02 | LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation | Tianyu Liu et.al. | 2507.01449 | null |
| 2025-07-01 | Quantize-Sample-and-Verify: LLM Acceleration via Adaptive Edge-Cloud Speculative Decoding | Guangyi Zhang et.al. | 2507.00605 | null |
| 2025-06-30 | User Concerns Regarding Social Robots for Mood Regulation: A Case Study on the “Sunday Blues” | Zhuochao Peng et.al. | 2507.00271 | null |
| 2025-07-08 | Fully Parallelized BP Decoding for Quantum LDPC Codes Can Outperform BP-OSD | Ming Wang et.al. | 2507.00254 | null |
| 2025-06-30 | Metal-poor single Wolf-Rayet stars: the interplay of optically thick winds and rotation | Lumen Boco et.al. | 2507.00137 | null |
| 2025-06-30 | Segmented Operations using Matrix Multiplications | Aleksandros Sobczyk et.al. | 2506.23906 | null |
| 2025-06-29 | From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows | Mohamed Amine Ferrag et.al. | 2506.23260 | null |
| 2025-06-28 | Polar alignment of a circumbinary disc around a brown dwarf binary | Jeremy L. Smallwood et.al. | 2506.22747 | null |
| 2025-07-03 | VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs | Raghavv Goel et.al. | 2506.22694 | null |
| 2025-06-27 | QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization | Danush Khanna et.al. | 2506.22396 | null |
| 2025-07-10 | Cool Gas in the Circumgalactic Medium of Massive Post Starburst Galaxies | Zoe Harvey et.al. | 2506.22287 | null |
| 2025-06-26 | Small Encoders Can Rival Large Decoders in Detecting Groundedness | Istabrak Abbes et.al. | 2506.21288 | null |
| 2025-06-26 | You never have enough J/ $ψ$ events: the case for a J/$ψ$ factory | Stephen Lars Olsen et.al. | 2506.20975 | null |
| 2025-06-17 | Utility-Driven Speculative Decoding for Mixture-of-Experts | Anish Saxena et.al. | 2506.20675 | null |
| 2025-07-09 | Charged rotating quantum black holes | Dyuman Bhattacharya et.al. | 2506.19941 | null |
| 2025-06-23 | Entangled Quantum Negative Energy Teleportation as a Probe of Semiclassical Gravity | Daniel S. Zachary et.al. | 2506.19878 | null |
| 2025-06-24 | Scaling Speculative Decoding with Lookahead Reasoning | Yichao Fu et.al. | 2506.19830 | null |
| 2025-06-23 | LLMs on a Budget? Say HOLA | Zohaib Hasan Siddiqui et.al. | 2506.18952 | null |
| 2025-07-10 | The Full Nonlinear Vortex Tube-Vorton Method: the post-stall condition | Jesus Carlos Pimentel-Garcia et.al. | 2506.18719 | null |
| 2025-06-17 | Semantic uncertainty in advanced decoding methods for LLM generation | Darius Foodeei et.al. | 2506.17296 | null |
| 2025-07-08 | Capturing Misalignment | Pierfrancesco Guarino et.al. | 2506.17176 | null |
| 2025-06-20 | ReasonGRM: Enhancing Generative Reward Models through Large Reasoning Models | Bin Chen et.al. | 2506.16712 | null |
| 2025-07-02 | Rethinking LLM Training through Information Geometry and Quantum Metrics | Riccardo Di Sipio et.al. | 2506.15830 | null |
| 2025-06-15 | $\texttt{SPECS}$ : Faster Test-Time Scaling through Speculative Drafts | Mert Cemri et.al. | 2506.15733 | null |
| 2025-06-18 | CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies | Donghyun Gouk et.al. | 2506.15601 | null |
| 2025-06-18 | PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction | Shufan Li et.al. | 2506.15556 | null |
| 2025-06-17 | Optimistic MEV in Ethereum Layer 2s: Why Blockspace Is Always in Demand | Ozan Solmaz et.al. | 2506.14768 | null |
| 2025-06-17 | S $^4$ C: Speculative Sampling with Syntactic and Semantic Coherence for Efficient Inference of Large Language Models | Tao He et.al. | 2506.14158 | null |
| 2025-06-16 | Stimulus Motion Perception Studies Imply Specific Neural Computations in Human Visual Stabilization | David W Arathorn et.al. | 2506.13506 | null |
| 2025-06-21 | Exploring the Secondary Risks of Large Language Models | Jiawei Chen et.al. | 2506.12382 | null |
| 2025-06-14 | Quantum Machine Learning | Muhammad Usman et.al. | 2506.12292 | null |
| 2025-06-13 | Fluid-induced snap-through instability of spherical shells | Pier Giuseppe Ledda et.al. | 2506.12247 | null |
| 2025-06-13 | Eliciting Reasoning in Language Models with Cognitive Tools | Brown Ebouky et.al. | 2506.12115 | null |
| 2025-06-12 | SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding | Ziyi Zhang et.al. | 2506.11309 | null |
| 2025-06-11 | Speculative Design in Spiraling Time: Methods and Indigenous HCI | James Eschrich et.al. | 2506.10229 | null |
| 2025-06-11 | V455 Car: an oscillating eclipsing Algol-type binary in triple star system | Zhao-Long Deng et.al. | 2506.10124 | null |
| 2025-06-11 | Patterns of Patterns III | Joseph Corneli et.al. | 2506.09696 | null |
| 2025-07-13 | SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving | Xiangchen Li et.al. | 2506.09397 | null |
| 2025-06-11 | A collection of results relating the geometry of plane domains and the exit time of planar Brownian motion, II | Greg Markowsky et.al. | 2506.09364 | null |
| 2025-07-19 | Draft-based Approximate Inference for LLMs | Kevin Galim et.al. | 2506.08373 | link |
| 2025-06-10 | Solving Convex-Concave Problems with $\tilde{\mathcal{O}}(ε^{-4/7})$ Second-Order Oracle Complexity | Lesi Chen et.al. | 2506.08362 | null |
| 2025-06-09 | MiniCPM4: Ultra-Efficient LLMs on End Devices | MiniCPM Team et.al. | 2506.07900 | link |
| 2025-06-09 | FREESS: An Educational Simulator of a RISC-V-Inspired Superscalar Processor Based on Tomasulo’s Algorithm | Roberto Giorgi et.al. | 2506.07665 | link |
| 2025-06-09 | LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments | Jin Huang et.al. | 2506.07416 | null |
| 2025-06-08 | Exploiting Inaccurate Branch History in Side-Channel Attacks | Yuhui Zhu et.al. | 2506.07263 | null |
| 2025-06-07 | Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit | Charles Goddard et.al. | 2506.06607 | null |
| 2025-06-06 | Fake Friends and Sponsored Ads: The Risks of Advertising in Conversational Search | Jacob Erickson et.al. | 2506.06447 | null |
| 2025-07-08 | On the Fundamental Impossibility of Hallucination Control in Large Language Models | Michał P. Karpowicz et.al. | 2506.06382 | null |
| 2025-06-06 | Disk Evolution Study Through Imaging of Nearby Young Stars (DESTINYS): Evidence of planet-disk interaction in the 2MASSJ16120668-3010270 system | C. Ginski et.al. | 2506.05892 | null |
| 2025-06-10 | Gumbel-max List Sampling for Distribution Coupling with Multiple Samples | Joseph Rowan et.al. | 2506.05632 | null |
| 2025-06-05 | Accelerated Test-Time Scaling with Model-Free Speculative Sampling | Woomin Song et.al. | 2506.04708 | null |
| 2025-06-04 | Guided Speculative Inference for Efficient Test-Time Alignment of LLMs | Jonathan Geuter et.al. | 2506.04118 | link |
| 2025-06-04 | The Causal-Noncausal Tail Processes: An Introduction | Christian Gouriéroux et.al. | 2506.04046 | null |
| 2025-06-04 | AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism | Zhepei Wei et.al. | 2506.03700 | link |
| 2025-06-04 | POSS: Position Specialist Generates Better Draft for Speculative Decoding | Langlin Huang et.al. | 2506.03566 | link |
| 2025-06-02 | Out-of-Vocabulary Sampling Boosts Speculative Decoding | Nadav Timor et.al. | 2506.03206 | null |
| 2025-06-03 | Feedstack: Layering Structured Representations over Unstructured Feedback to Scaffold Human AI Conversation | Hannah Vy Nguyen et.al. | 2506.03052 | null |
| 2025-06-03 | Reuse or Generate? Accelerating Code Editing via Edit-Oriented Speculative Decoding | Peiding Wang et.al. | 2506.02780 | null |
| 2025-06-28 | Multi Layered Autonomy and AI Ecologies in Robotic Art Installations | Baoyang Chen et.al. | 2506.02606 | null |
| 2025-06-03 | Consultant Decoding: Yet Another Synergistic Mechanism | Chuanghao Ding et.al. | 2506.02391 | null |
| 2025-06-02 | Radiation GRMHD Models of Accretion onto Stellar-Mass Black Holes: I. Survey of Eddington Ratios | Lizhong Zhang et.al. | 2506.02289 | null |
| 2025-05-16 | SpecMemo: Speculative Decoding is in Your Pocket | Selin Yildirim et.al. | 2506.01986 | null |
| 2025-05-16 | Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism | Yuhao Shen et.al. | 2506.01979 | null |
| 2025-06-02 | Synchronic Web Digital Identity: Speculations on the Art of the Possible | Thien-Nam Dinh et.al. | 2506.01856 | null |
| 2025-07-04 | Playing with Transformer at 30+ FPS via Next-Frame Diffusion | Xinle Cheng et.al. | 2506.01380 | null |
| 2025-06-02 | Shape Shifting Light Dark Matter Solitons | Dor Ben-Amotz et.al. | 2506.01282 | null |
| 2025-06-01 | The $M_{\rm BH}-M_\star$ Relation of the hyperluminous Dust-obscured Quasars up to $z \sim 4$ | Yibin Luo et.al. | 2506.01218 | null |
| 2025-06-01 | Mamba Drafters for Speculative Decoding | Daewon Choi et.al. | 2506.01206 | null |
| 2025-06-01 | The Inverse Scaling Effect of Pre-Trained Language Model Surprisal Is Not Due to Data Leakage | Byung-Doh Oh et.al. | 2506.01172 | null |
| 2025-05-31 | Accelerating Diffusion LLMs via Adaptive Parallel Decoding | Daniel Israel et.al. | 2506.00413 | null |
| 2025-05-31 | Speculative Reward Model Boosts Decision Making Ability of LLMs Cost-Effectively | Jiawei Gu et.al. | 2506.00396 | link |
| 2025-05-30 | Cross-Attention Speculative Decoding | Wei Zhong et.al. | 2505.24544 | null |
| 2025-05-30 | CLaSp: In-Context Layer Skip for Self-Speculative Decoding | Longze Chen et.al. | 2505.24196 | null |
| 2025-06-10 | Ghidorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism | Jinhui Wei et.al. | 2505.23219 | null |
| 2025-05-28 | Pre-Training Curriculum for Multi-Token Prediction in Language Models | Ansar Aynetdinov et.al. | 2505.22757 | link |
| 2025-05-28 | Mass-feeding of jet-launching white dwarfs in grazing and common envelope evolution | Noam Soker et.al. | 2505.22621 | null |
| 2025-05-29 | Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design | Yudi Zhang et.al. | 2505.22179 | link |
| 2025-05-28 | RAD: Redundancy-Aware Distillation for Hybrid Models via Self-Speculative Decoding | Yuichiro Hoshino et.al. | 2505.22135 | null |
| 2025-05-28 | Robust and Symmetric Magnetic Field Dependency of Superconducting Diode Effect in Asymmetric Dirac Semimetal SQUIDs | H. C. Travaglini et.al. | 2505.21861 | null |
| 2025-05-27 | Computocene: Notes from an Age of Observation | Simone Severini et.al. | 2505.21744 | null |
| 2025-05-27 | Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits | Yeshwanth Venkatesha et.al. | 2505.21594 | null |
| 2025-05-27 | Hardware-Efficient Attention for Fast Decoding | Ted Zadouri et.al. | 2505.21487 | null |
| 2025-05-27 | Pair binding and Hund’s rule breaking in high-symmetry fullerenes | R. Rausch et.al. | 2505.21455 | null |
| 2025-05-28 | Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity | Yehui Tang et.al. | 2505.21411 | null |
| 2025-05-27 | Repeated Auctions with Speculators: Arbitrage Incentives and Forks in DAOs | Nicolas Eschenbaum et.al. | 2505.21296 | null |
| 2025-05-27 | SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences | Jungyoub Cha et.al. | 2505.20776 | link |
| 2025-05-27 | Replication of Reference-Dependent Preferences and the Risk-Return Trade-Off in the Chinese Market | Penggan Xu et.al. | 2505.20608 | null |
| 2025-05-26 | Academic Research Output Derivatives: Structuring Futures and Options on Research Output Index | Amarendra Sharma et.al. | 2505.20492 | null |
| 2025-05-26 | Bounded cohomology, quotient extensions, and hierarchical hyperbolicity | Francesco Fournier-Facio et.al. | 2505.20462 | null |
| 2025-05-26 | HAMburger: Accelerating LLM Inference via Token Smashing | Jingyu Liu et.al. | 2505.20438 | null |
| 2025-05-23 | Reinforcement Speculative Decoding for Fast Ranking | Yingpeng Du et.al. | 2505.20316 | null |
| 2025-06-13 | MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE | Zongle Huang et.al. | 2505.19645 | null |
| 2025-05-28 | Faster and Better LLMs via Latency-Aware Test-Time Scaling | Zili Wang et.al. | 2505.19634 | null |
| 2025-07-23 | Turing Test 2.0: The General Intelligence Threshold | Georgios Mappouras et.al. | 2505.19550 | null |
| 2025-05-29 | DREAM: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding | Yunhai Hu et.al. | 2505.19201 | link |
| 2025-05-25 | Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs | Xuan Zhang et.al. | 2505.19155 | null |
| 2025-05-24 | Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding | Yixuan Wang et.al. | 2505.18629 | null |
| 2025-05-23 | VeriThinker: Learning to Verify Makes Reasoning Model Efficient | Zigeng Chen et.al. | 2505.17941 | link |
| 2025-05-20 | Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency | Ruixiao Li et.al. | 2505.17074 | null |
| 2025-05-16 | SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs | Jinwoo Park et.al. | 2505.17052 | null |
| 2025-05-22 | KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization | Mingbo Song et.al. | 2505.16162 | null |
| 2025-05-21 | Strong Hilbert space fragmentation and fractons from subsystem and higher-form symmetries | Charles Stahl et.al. | 2505.15889 | null |
| 2025-05-21 | Quasinormal Modes of Schwarzschild Black Holes in the Dehnen-(1, 4, 5/2) Type Dark Matter Halos | Qi-Qi Liang et.al. | 2505.15540 | null |
| 2025-06-03 | Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding | Zijian Lin et.al. | 2505.15380 | null |
| 2025-05-21 | SSR: Speculative Parallel Scaling Reasoning in Test-time | Yuanlin Chu et.al. | 2505.15340 | null |
| 2025-05-21 | BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms | Yunlong Hou et.al. | 2505.15141 | null |
| 2025-05-20 | STree: Speculative Tree Decoding for Hybrid State-Space Models | Yangchao Wu et.al. | 2505.14969 | null |
| 2025-05-20 | On the Day They Experience: Awakening Self-Sovereign Experiential AI Agents | Botao Amber Hu et.al. | 2505.14893 | null |
| 2025-05-20 | Unremarkable to Remarkable AI Agent: Exploring Boundaries of Agent Intervention for Adults With and Without Cognitive Impairment | Mai Lee Chang et.al. | 2505.14872 | null |
| 2025-05-20 | X-ray properties of compact elliptical galaxies | Orsolya E. Kovacs et.al. | 2505.14768 | null |
| 2025-05-20 | Speculative Decoding Reimagined for Multimodal Large Language Models | Luxi Lin et.al. | 2505.14260 | link |
| 2025-05-19 | Language and Thought: The View from LLMs | Daniel Rothschild et.al. | 2505.13561 | null |
| 2025-05-19 | HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding | Siran Liu et.al. | 2505.13254 | null |
| 2025-09-15 | Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification | Jikai Wang et.al. | 2505.13204 | null |
| 2025-05-19 | FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference | Guangda Liu et.al. | 2505.13109 | null |
| 2025-05-25 | FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks | Zihua Wang et.al. | 2505.12728 | link |
| 2025-05-18 | Traversal Verification for Speculative Tree Decoding | Yepeng Weng et.al. | 2505.12398 | null |
| 2025-05-16 | FAIR Ecosystems for Science at Scale | Sean R. Wilkinson et.al. | 2505.11742 | null |
| 2025-05-16 | Prime Number Error Terms | Nathan Ng et.al. | 2505.11295 | null |
| 2025-05-16 | Beyond surfaces: quantifying internal radiative heat transport in dense materials | Janak Tiwari et.al. | 2505.10853 | null |
| 2025-05-16 | Qualia Optimization | Philip S. Thomas et.al. | 2505.10779 | null |
| 2025-07-10 | Anchoring AI Capabilities in Market Valuations: The Capability Realization Rate Model and Valuation Misalignment Risk | Xinmin Fang et.al. | 2505.10590 | null |
| 2025-05-18 | MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models | Mugilan Ganesan et.al. | 2505.10526 | null |
| 2025-05-21 | SpecOffload: Unlocking Latent GPU Capacity for LLM Inference on Resource-Constrained Devices | Xiangwen Zhuge et.al. | 2505.10259 | link |
| 2025-05-14 | Chandra Rules Out Super-Eddington Accretion For Little Red Dots | Andrea Sacchi et.al. | 2505.09669 | null |
| 2025-06-28 | Extended Structural Dynamics – Emergent Irreversibility from Reversible Dynamics | Patrick BarAvi et.al. | 2505.09650 | null |
| 2025-05-14 | Observational study of the formation of homologous confined circular-ribbon flares | Shuhong Yang et.al. | 2505.09093 | null |
| 2025-05-13 | Long timescale numerical simulations of large, super-critical accretion discs | P. Chris Fragile et.al. | 2505.08859 | null |
| 2025-05-13 | Kudzu: Fast and Simple High-Throughput BFT | Victor Shoup et.al. | 2505.08771 | null |
| 2025-05-13 | Automatic Task Detection and Heterogeneous LLM Speculative Decoding | Danying Ge et.al. | 2505.08600 | null |
| 2025-05-12 | GUP Effective Metric Without GUP: Implications for the Sign of GUP Parameter and Quantum Bounce | Yen Chin Ong et.al. | 2505.07972 | null |
| 2025-05-12 | Localized Gravity, de Sitter, and the Horizon Criterion | Bjoern Friedrich et.al. | 2505.07934 | null |
| 2025-06-22 | TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking | Ching Nam Hang et.al. | 2505.07891 | null |
| 2025-05-08 | Scaling Laws for Speculative Decoding | Siyuan Yan et.al. | 2505.07858 | null |
| 2025-05-12 | SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models | Hang Wu et.al. | 2505.07680 | null |
| 2025-05-10 | N-body simulations of the Self-Confinement of Viscous Self-Gravitating Narrow Eccentric Planetary Ringlets | Joseph M. Hahn et.al. | 2505.06639 | null |
| 2025-05-09 | FastDup: a scalable duplicate marking tool using speculation-and-test mechanism | Zhonghai Zhang et.al. | 2505.06127 | link |
| 2025-05-08 | A Physics Model for Origin of Life | Paul Howard Frampton et.al. | 2505.05634 | null |
| 2025-05-08 | Memory Under Siege: A Comprehensive Survey of Side-Channel Attacks on Memory | MD Mahady Hassan et.al. | 2505.04896 | null |
| 2025-05-08 | Topological phase transition to a hidden charge density wave liquid | Joshua S. H. Lee et.al. | 2505.04867 | null |
| 2025-05-07 | SOAEsV2-7B/72B: Full-Pipeline Optimization for State-Owned Enterprise LLMs via Continual Pre-Training, Domain-Progressive SFT and Distillation-Enhanced Speculative Decoding | Jingyang Deng et.al. | 2505.04723 | null |
| 2025-05-06 | Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Autospeculation | Hengyuan Hu et.al. | 2505.03983 | null |
| 2025-05-06 | QiMeng-CPU-v2: Automated Superscalar Processor Design by Learning Data Dependencies | Shuyao Cheng et.al. | 2505.03195 | null |
| 2025-05-04 | The quest for explosive bubbles in the Indonesian Rupiah/US exchange rate: Does the uncertainty trinity matter? | Abdul Khaliq et.al. | 2505.02869 | null |
| 2025-05-24 | Accelerating Large Language Model Reasoning via Speculative Search | Zhihai Wang et.al. | 2505.02865 | null |
| 2025-05-21 | Dirac Singleton as a Relativistic Field Beyond Standard Model | M. A. Vasiliev et.al. | 2505.01915 | null |
| 2025-05-03 | Speculative Evolution Through 3D Cellular Automata | Amir Hossein Khazaei et.al. | 2505.01692 | null |
| 2025-05-02 | PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding | Bradley McDanel et.al. | 2505.01572 | null |
| 2025-05-12 | Emotions in Artificial Intelligence | Hermann Borotschnig et.al. | 2505.01462 | null |
| 2025-04-29 | X-ray Spectroscopy via Temporal Decomposition | William Setterberg et.al. | 2504.21169 | null |
| 2025-07-02 | Ground to Dust: Collisional Cascades and the Fate of Kardashev II Megaswarms | Brian C. Lacki et.al. | 2504.21151 | null |
| 2025-06-10 | EvoPort: An Evolutionary Framework for Portfolio Optimization via Randomized Alpha Discovery and Ensemble-Based Allocation | Nguyen Van Thanh et.al. | 2504.21095 | null |
| 2025-04-29 | Reviving Any-Subset Autoregressive Models with Principled Parallel Sampling and Speculative Decoding | Gabe Guo et.al. | 2504.20456 | link |
| 2025-04-28 | AutoJudge: Judge Decoding Without Manual Annotation | Roman Garipov et.al. | 2504.20039 | null |
| 2025-04-27 | Detecting speculative data flow vulnerabilities using weakest precondition reasoning | Graeme Smith et.al. | 2504.19128 | null |
| 2025-05-25 | Efficient Reasoning for LLMs through Speculative Chain-of-Thought | Jikai Wang et.al. | 2504.19095 | link |
| 2025-04-26 | Global Simulations of Gravitational Instability in Protostellar Disks with Full Radiation Transport II. Locality of Gravitoturbulence, Clumpy Spirals, and Implications for Observable Substructure | Wenrui Xu et.al. | 2504.18751 | null |
| 2025-06-15 | PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation | Zihao An et.al. | 2504.18583 | null |
| 2025-04-25 | Generalizing the relativistic precession model of quasi-periodic oscillations through anharmonic corrections | Roberto Giambò et.al. | 2504.18403 | null |
| 2025-04-23 | A Vision for AI-Driven Adaptation of Dynamic AR Content to Users and Environments | Julian Rasch et.al. | 2504.16562 | null |
| 2025-04-23 | Hardness of Median and Center in the Ulam Metric | Nick Fischer et.al. | 2504.16437 | null |
| 2025-04-22 | On commuting integer matrices | Jonathan Chapman et.al. | 2504.15839 | null |
| 2025-04-22 | Delayed Keen Model with Inflation | Ali Tolga Dincer et.al. | 2504.15819 | null |
| 2025-04-23 | Speculative Sampling via Exponential Races | Szymon Kobus et.al. | 2504.15475 | null |
| 2025-05-16 | Rendezvous in CAVITY: Kinematics and gas properties of an isolated dwarf-dwarf merging pair in a cosmic void region | Bahar Bidaran et.al. | 2504.15359 | null |
| 2025-04-21 | The phase diagram of CeRh ${2}$As${2}$ for out-of-plane magnetic field | P. Khanenko et.al. | 2504.15112 | null |
| 2025-04-21 | Safety Co-Option and Compromised National Security: The Self-Fulfilling Prophecy of Weakened AI Risk Thresholds | Heidy Khlaaf et.al. | 2504.15088 | null |
| 2025-04-21 | Note on Type $III_1$ Algebras in $ c= 1$ String Theory and Bulk Causal Diamonds | T. Banks et.al. | 2504.15076 | null |
| 2025-04-21 | Exploring Collaborative GenAI Agents in Synchronous Group Settings: Eliciting Team Perceptions and Design Considerations for the Future of Work | Janet G. Johnson et.al. | 2504.14779 | null |
| 2025-05-27 | BLACKOUT: Data-Oblivious Computation with Blinded Capabilities | Hossam ElAtali et.al. | 2504.14654 | null |
| 2025-04-25 | UFO2: The Desktop AgentOS | Chaoyun Zhang et.al. | 2504.14603 | link |
| 2025-04-20 | An interstellar mission to test astrophysical black holes | Cosimo Bambi et.al. | 2504.14576 | null |
| 2025-04-19 | Charge Densities in Crystals and Triply-Periodic Minimal Surfaces | Mengdi Yin et.al. | 2504.14148 | null |
| 2025-04-18 | Going Whole Hog: A Philosophical Defense of AI Cognition | Herman Cappelen et.al. | 2504.13988 | null |
| 2025-04-16 | From job titles to jawlines: Using context voids to study generative AI systems | Shahan Ali Memon et.al. | 2504.13947 | null |
| 2025-03-21 | Bio-crafting Architecture: Experiences of growing mycelium in minimal surface molds | Anca-Simona Horvath et.al. | 2504.13855 | null |
| 2025-05-28 | The Sky as a Killing Horizon | Níckolas de Aguiar Alves et.al. | 2504.12514 | null |
| 2025-04-12 | Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time | Wang Yang et.al. | 2504.12329 | link |
| 2025-04-18 | Large Language Model-Based Knowledge Graph System Construction for Sustainable Development Goals: An AI-Based Speculative Design Perspective | Yi-De Lin et.al. | 2504.12309 | null |
| 2025-04-16 | Purposefully Induced Psychosis (PIP): Embracing Hallucination as Imagination in Large Language Models | Kris Pilcher et.al. | 2504.12012 | null |
| 2025-04-16 | Who Said Only Military Officers Can Deal with Uncertainty? On the Importance of Uncertainty in EdTech Data Visualisations | Felicitas Macgilchrist et.al. | 2504.11974 | null |
| 2025-04-15 | Five dimensional rotating and Quintessence black hole and their shadows | Milko Estrada et.al. | 2504.11408 | null |
| 2025-04-16 | Efficient Distributed Retrieval-Augmented Generation for Enhancing Language Model Performance | Shangyu Liu et.al. | 2504.11197 | null |
| 2025-04-14 | Shield Bash: Abusing Defensive Coherence State Retrieval to Break Timing Obfuscation | Kartik Ramkrishnan et.al. | 2504.10318 | null |
| 2025-04-14 | Gravitational metamaterials from optical properties of spacetime media | Orlando Luongo et.al. | 2504.09987 | null |
| 2025-04-12 | Authoritarian Recursions: How Fiction, History, and AI Reinforce Control in Education, Warfare, and Discourse | Hasan Oguz et.al. | 2504.09030 | null |
| 2025-04-11 | SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting | Jiaming Xu et.al. | 2504.08850 | null |
| 2025-05-31 | SD $^2$ : Self-Distilled Sparse Drafters | Mike Lasby et.al. | 2504.08838 | null |
| 2025-04-05 | SLOs-Serve: Optimized Serving of Multi-SLO LLMs | Siyuan Chen et.al. | 2504.08784 | null |
| 2025-04-11 | Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices | Shengyuan Ye et.al. | 2504.08242 | null |
| 2025-05-16 | SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning | Rui Pan et.al. | 2504.07891 | link |
| 2025-04-10 | Synthetic Fluency: Hallucinations, Confabulations, and the Creation of Irish Words in LLM-Generated Translations | Sheila Castilho et.al. | 2504.07680 | null |
| 2025-04-10 | Proceedings of the Purposeful XR Workshop for CHI 2025 | Elizabeth Childs et.al. | 2504.07475 | null |
| 2025-04-09 | Joint Survey Processing. III. Compact Oddballs in the COSMOS Field – Little Red Dots and Transients | Yu-Heng Lin et.al. | 2504.07196 | null |
| 2025-04-09 | ShadowBinding: Realizing Effective Microarchitectures for In-Core Secure Speculation Schemes | Amund Bergland Kvalsvik et.al. | 2504.07018 | null |
| 2025-04-08 | SPIRe: Boosting LLM Inference Throughput with Speculative Decoding | Sanjit Neelam et.al. | 2504.06419 | null |
| 2025-04-08 | Decoding the Ishango Bone: Unveiling Prehistoric Mathematical Art | Jenny Baur et.al. | 2504.06412 | null |
| 2025-04-08 | Interplay between trimer structure and magnetic ground state in Ba5Ru3O12 probed by Neutron and muSR techniques | E. Kushwaha et.al. | 2504.06113 | null |
| 2025-04-08 | Strong Evidence That Abiogenesis Is a Rapid Process on Earth Analogs | David Kipping et.al. | 2504.05993 | null |
| 2025-04-08 | DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding | Hossein Entezari Zarch et.al. | 2504.05598 | null |
| 2025-06-03 | Speculative Automated Refactoring of Imperative Deep Learning Programs to Graph Execution | Raffi Khatchadourian et.al. | 2504.05424 | null |
| 2025-04-06 | pc-COP: An Efficient and Configurable 2048-p-Bit Fully-Connected Probabilistic Computing Accelerator for Combinatorial Optimization | Kiran Magar et.al. | 2504.04543 | null |
| 2025-06-02 | Representations of $p$ -adic groups and orbits with smooth closure in a variety of Langlands parameters | Kristaps Balodis et.al. | 2504.04163 | null |
| 2025-04-05 | PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models | Haofei Yin et.al. | 2504.04104 | null |
| 2025-03-23 | Agentic Business Process Management: The Past 30 Years And Practitioners’ Future Perspectives | Hoang Vu et.al. | 2504.03693 | null |
| 2025-04-04 | Ethics Readiness of Technology: The case for aligning ethical approaches with technological maturity | Eline de Jong et.al. | 2504.03336 | null |
| 2025-04-03 | A Review of Prototyping in XR: Linking Extended Reality to Digital Fabrication | Bixun Chen et.al. | 2504.02998 | null |
| 2025-05-02 | GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation | Zhiyuan Yan et.al. | 2504.02782 | link |
| 2025-04-03 | Black Holes, Moduli Stabilisation and the Swampland | Matilda Delgado et.al. | 2504.02645 | null |
| 2025-04-08 | Variational Online Mirror Descent for Robust Learning in Schrödinger Bridge | Dong-Sig Han et.al. | 2504.02618 | null |
| 2025-06-16 | Graviton Scattering on Gravitational Atoms: Relic Graviton Shot Noise | Benjamin Avila-Lopez et.al. | 2504.01286 | null |
| 2025-04-01 | Reminiscences about Steven Weinberg (This Time it’s Personal) | C. P. Burgess et.al. | 2504.01118 | null |
| 2025-04-01 | Mesoscale Eddy – Internal Wave Coupling. III. The End of the Enstrophy Cascade and Maintenance of Gyre Scale Potential Vorticity Gradients | Kurt L. Polzin et.al. | 2504.00486 | null |
| 2025-04-01 | The Impact of Triangular-Toothed Gears on the Functionality of the Antikythera Mechanism | Esteban Guillermo Szigety y Gustavo Francisco Arenas et.al. | 2504.00327 | null |
| 2025-06-04 | Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding | Aayush Gautam et.al. | 2504.00030 | null |
| 2025-03-31 | What the F*ck Is Artificial General Intelligence? | Michael Timothy Bennett et.al. | 2503.23923 | null |
| 2025-03-31 | A search for the three isomers of cyano-1,3-butadiene in TMC-1: Implications for bottom-up routes involving 1,3-butadiene | M. Agundez et.al. | 2503.23841 | null |
| 2025-03-30 | Credit, Land Speculation, and Low-Interest-Rate Policy | Tomohiro Hirano et.al. | 2503.23552 | null |
| 2025-03-30 | The Longest Duration SGRE Event in Solar Cycle 25 | Nat Gopalswamy et.al. | 2503.23544 | null |
| 2025-03-30 | Speculative End-Turn Detector for Efficient Speech Chatbot Assistant | Hyunjong Ok et.al. | 2503.23439 | null |
| 2025-03-29 | Beyond speculation: Measuring the growing presence of LLM-generated texts in multilingual disinformation | Dominik Macko et.al. | 2503.23242 | null |
| 2025-03-28 | Formation and Evolution of Compact Binaries Containing Intermediate Mass Black Holes in Dense Star Clusters` | Seungjae Lee et.al. | 2503.22109 | null |
| 2025-03-27 | How to Constrain the Stochastic Gravitational Wave Background with Multi-Frequency Detections | Eleanor Gleave et.al. | 2503.21508 | null |
| 2025-03-26 | Speculations on higher Fukaya categories | James Pascaleff et.al. | 2503.20906 | null |
| 2025-03-24 | The Centers and Margins of Modeling Humans in Well-being Technologies: A Decentering Approach | Jichen Zhu et.al. | 2503.19132 | null |
| 2025-05-14 | Spectropolarimetry of A Nuclear Transient AT2023clx: Revealing The Geometrical Alignment between The Transient Outflow and The Nuclear Dusty Region | Kohki Uno et.al. | 2503.19024 | null |
| 2025-03-23 | A Novel Hat-Shaped Device-Cloud Collaborative Inference Framework for Large Language Models | Zuan Xie et.al. | 2503.18989 | null |
| 2025-03-23 | A Multi-Model Adaptation of Speculative Decoding for Classification | Somnath Roy et.al. | 2503.18076 | null |
| 2025-03-20 | SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs | Shibo Jie et.al. | 2503.16163 | null |
| 2025-03-20 | “This could save us months of work” – Use Cases of AI and Automation Support in Investigative Journalism | Besjon Cifliku et.al. | 2503.16011 | null |
| 2025-03-20 | SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models | Fahao Chen et.al. | 2503.15921 | null |
| 2025-03-19 | Prada: Black-Box LLM Adaptation with Private Data on Resource-Constrained Devices | Ziyao Wang et.al. | 2503.14932 | null |
| 2025-06-12 | The Origin of the Very-High-Energy Diffuse $γ$ -Ray Emission: The Case for Galactic Source Cocoons | Antonio Ambrosone et.al. | 2503.14651 | null |
| 2025-05-04 | Superconductivity in magnetars: Exploring type-I and type-II states in toroidal magnetic fields | Mayusree Das et.al. | 2503.14594 | null |
| 2025-03-26 | Association of 220 PeV Neutrino KM3-230213A with Gamma-Ray Bursts | Ruiqi Wang et.al. | 2503.14471 | null |
| 2025-03-18 | Neutron portal to ultra-high-energy neutrinos | Gustavo F. S. Alves et.al. | 2503.14419 | null |
| 2025-03-18 | Speculative Decoding for Verilog: Speed and Quality, All in One | Changran Xu et.al. | 2503.14153 | null |
| 2025-03-18 | Growing a Twig to Accelerate Large Vision-Language Models | Zhenwei Shao et.al. | 2503.14075 | null |
| 2025-03-17 | ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts | Evangelos Georganas et.al. | 2503.13565 | null |
| 2025-03-17 | Enhanced anomalous Hall effect in the topological Kagome metal Cs(V $_{1-x}$Mn$_x$)$_3$Sb$_5$ | Xinmin Wang et.al. | 2503.13351 | null |
| 2025-03-28 | WOW: Workflow-Aware Data Movement and Task Scheduling for Dynamic Scientific Workflows | Fabian Lehmann et.al. | 2503.13072 | link |
| 2025-05-15 | Collaborative Speculative Inference for Efficient LLM Inference Serving | Luyao Gao et.al. | 2503.10325 | null |
| 2025-03-13 | Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding | Jinze Li et.al. | 2503.10135 | null |
| 2025-03-12 | A practical guide to machine learning interatomic potentials – Status and future | Ryan Jacobs et.al. | 2503.09814 | null |
| 2025-03-11 | In Search of the Potentially Hazardous Asteroids in the Taurid Resonant Swarm | Jasmine Li et.al. | 2503.08670 | null |
| 2025-03-11 | Liquidity Competition Between Brokers and an Informed Trader | Ryan Donnelly et.al. | 2503.08287 | null |
| 2025-03-25 | Training Domain Draft Models for Speculative Decoding: Best Practices and Insights | Fenglu Hong et.al. | 2503.07807 | null |
| 2025-03-10 | Did smartphones break the world as we knew it? | Mikhail V. Tamm et.al. | 2503.07773 | null |
| 2025-03-13 | Design as Hope: Reimagining Futures for Seemingly Doomed Problems | JaeWon Kim et.al. | 2503.07586 | null |
| 2025-03-09 | A parallel parser for regular expressions | Angelo Borsotti et.al. | 2503.06763 | null |
| 2025-03-07 | Quantum-like cognition and decision making in the light of quantum measurement theory | Miho Fuyama et.al. | 2503.05859 | null |
| 2025-02-25 | Generative Artificial Intelligence: Evolving Technology, Growing Societal Impact, and Opportunities for Information Systems Research | Veda C. Storey et.al. | 2503.05770 | null |
| 2025-03-10 | Speculative Decoding for Multi-Sample Inference | Yiwei Li et.al. | 2503.05330 | null |
| 2025-03-07 | SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding | Kaiyu Huang et.al. | 2503.05096 | null |
| 2025-02-11 | Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations | Kunal Handa et.al. | 2503.04761 | null |
| 2025-03-19 | Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling | Yan Li et.al. | 2503.04398 | null |
| 2025-03-06 | A possible jet and corona configuration for Swift J1727.8–1613 during the hard state | Jing-Qiang Peng et.al. | 2503.04044 | null |
| 2025-03-05 | RASD: Retrieval-Augmented Speculative Decoding | Guofeng Quan et.al. | 2503.03434 | null |
| 2025-03-26 | SpecInF: Exploiting Idle GPU Resources in Distributed DL Training via Speculative Inference Filling | Cunchi Lv et.al. | 2503.02550 | null |
| 2025-04-02 | Linear Representations of Political Perspective Emerge in Large Language Models | Junsol Kim et.al. | 2503.02080 | link |
| 2025-04-23 | EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test | Yuhui Li et.al. | 2503.01840 | link |
| 2025-03-03 | Efficient Long-Term Structural Reliability Estimation with Non-Gaussian Stochastic Models: A Design of Experiments Approach | Sebastian Winter et.al. | 2503.01566 | null |
| 2025-03-17 | MeshPad: Interactive Sketch-Conditioned Artist-Designed Mesh Generation and Editing | Haoxuan Li et.al. | 2503.01425 | null |
| 2025-03-24 | Turbulence in virtual: II. Origin of skewness and dual fraction processes | Xunchuan Liu et.al. | 2503.01160 | null |
| 2025-03-02 | DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting | Kai Lv et.al. | 2503.00784 | link |
| 2025-03-02 | Speculative Ad-hoc Querying | Haoyu Li et.al. | 2503.00714 | link |
| 2025-03-04 | Tutorial Proposal: Speculative Decoding for Efficient LLM Inference | Heming Xia et.al. | 2503.00491 | null |
| 2025-03-01 | Peek into the `White-Box’: A Field Study on Bystander Engagement with Urban Robot Uncertainty | Xinyan Yu et.al. | 2503.00337 | null |
| 2025-03-01 | Doraemon’s Gadget Lab: Unpacking Human Needs and Interaction Design in Speculative Technology | Tram Thi Minh Tran et.al. | 2503.00257 | null |
| 2025-02-28 | Broadband pulsed quadrature measurements with calorimeters | Ezad Shojaee et.al. | 2503.00188 | null |
| 2025-02-28 | AMuLeT: Automated Design-Time Testing of Secure Speculation Countermeasures | Bo Fu et.al. | 2503.00145 | link |
| 2025-02-28 | Assessment of universal relations among second-order moments of relativistic stars via reformulated perturbation equations | Koutarou Kyutoku et.al. | 2503.00098 | null |
| 2025-02-14 | A Short History of Rocks: or, How to Invent Quantum Computing | David Wakeham et.al. | 2503.00005 | null |
| 2025-05-13 | Nano Drone-based Indoor Crime Scene Analysis | Martin Cooney et.al. | 2502.21019 | null |
| 2025-03-04 | Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff | Maximilian Holsman et.al. | 2502.20704 | link |
| 2025-02-28 | MonadBFT: Fast, Responsive, Fork-Resistant Streamlined Consensus | Mohammad Mussadiq Jalalzai et.al. | 2502.20692 | null |
| 2025-03-24 | Turbulence in virtual: Origin of the variance and skewness of density function | Xunchuan Liu et.al. | 2502.20458 | null |
| 2025-02-27 | Long-Context Inference with Retrieval-Augmented Speculative Decoding | Guanzheng Chen et.al. | 2502.20330 | link |
| 2025-04-28 | Frobenius subalgebra lattices in tensor categories | Mainak Ghosh et.al. | 2502.19876 | null |
| 2025-03-04 | Speculative Decoding and Beyond: An In-Depth Survey of Techniques | Yunhai Hu et.al. | 2502.19732 | null |
| 2025-02-26 | From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens | Tong Wu et.al. | 2502.18890 | link |
| 2025-02-26 | Reimagining Personal Data: Unlocking the Potential of AI-Generated Images in Personal Data Meaning-Making | Soobin Park et.al. | 2502.18853 | null |
| 2025-02-26 | Towards Optimal Multi-draft Speculative Decoding | Zhengmian Hu et.al. | 2502.18779 | null |
| 2025-03-02 | Variability of Central Stars of Planetary Nebulae with the Zwicky Transient Facility. II. Long-Timescale Variables including Wide Binary and Late Thermal Pulse Candidates | Soumyadeep Bhattacharjee et.al. | 2502.18651 | null |
| 2025-02-27 | Kinematics of metallicity populations in Omega Centauri using Gaia Focused Product Release and Hubble Space Telescope | Nagaraj Vernekar et.al. | 2502.17755 | null |
| 2025-02-24 | Knowledge Distillation with Training Wheels | Guanlin Liu et.al. | 2502.17717 | null |
| 2025-02-24 | THOR: A Non-Speculative Value Dependent Timing Side Channel Attack Exploiting Intel AMX | Farshad Dizani et.al. | 2502.17658 | null |
| 2025-02-24 | LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification | Penghui Yang et.al. | 2502.17421 | link |
| 2025-02-24 | Defects in the $β$-Ga$_2$O$_3$($\bar201$)/HfO$_2$ MOS system and the effect of thermal treatments | Khushabu. S. Agrawal et.al. | 2502.17112 | null |
| 2025-05-25 | CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter | Yepeng Weng et.al. | 2502.16880 | null |
| 2025-02-24 | APINT: A Full-Stack Framework for Acceleration of Privacy-Preserving Inference of Transformers based on Garbled Circuits | Hyunjun Cho et.al. | 2502.16877 | null |
| 2025-04-03 | Towards Reinforcement Learning for Exploration of Speculative Execution Vulnerabilities | Evan Lai et.al. | 2502.16756 | null |
| 2025-02-22 | Fluctuating Lattice, Several Energy Scales | Holger Bech Nielsen et.al. | 2502.16369 | null |
| 2025-02-21 | DReSD: Dense Retrieval for Speculative Decoding | Milan Gritta et.al. | 2502.15572 | link |
| 2025-02-27 | PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System | Yintao He et.al. | 2502.15470 | null |
| 2025-02-24 | Ultra-high-energy $γ$ -ray emission associated with the tail of a bow-shock pulsar wind nebula | Zhen Cao et.al. | 2502.15447 | null |
| 2025-02-21 | TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding | Zhaoxuan Wu et.al. | 2502.15197 | null |
| 2025-02-21 | A Critical Examination of the Nested Leaky Box Model for Galactic Cosmic Ray Transport | Benedikt Schroer et.al. | 2502.15115 | null |
| 2025-03-11 | FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling | Weilin Zhao et.al. | 2502.14856 | null |
| 2025-05-07 | Fusion rules and structure constants of E-series minimal models | Rongvoram Nivesvivat et.al. | 2502.14295 | null |
| 2025-02-19 | Which Attention Heads Matter for In-Context Learning? | Kayo Yin et.al. | 2502.14010 | link |
| 2025-03-17 | NVR: Vector Runahead on NPUs for Sparse Memory Access | Hui Wang et.al. | 2502.13873 | null |
| 2025-02-19 | Hierarchical accretion flow from the G351 infrared dark filament to its central cores | H. Beuther et.al. | 2502.13866 | null |
| 2025-02-19 | C2T: A Classifier-Based Tree Construction Method in Speculative Decoding | Feiye Huo et.al. | 2502.13652 | null |
| 2025-02-19 | Near-extremal dumb holes and some aspects of the Hawking effect | Akshat Pandey et.al. | 2502.13557 | null |
| 2025-02-19 | Radio observations of the ultra-long GRB 220627A reveal a hot cocoon supporting the blue supergiant progenitor scenario | James K. Leung et.al. | 2502.13435 | null |
| 2025-02-18 | Inconsistent metallicity spreads in first generation stars of globular clusters from high resolution spectroscopy and HST photometry | Eugenio Carretta et.al. | 2502.13206 | null |
| 2025-02-17 | SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs | Yige Xu et.al. | 2502.12134 | null |
| 2025-02-16 | AI Generations: From AI 1.0 to AI 4.0 | Jiahao Wu et.al. | 2502.11312 | null |
| 2025-02-16 | Coherent Spin Pumping Originated from Sub-Terahertz Néel Vector Dynamics in Easy Plane α-Fe2O3/Pt | Gregory Fritjofson et.al. | 2502.11281 | null |
| 2025-02-16 | GRIFFIN: Effective Token Alignment for Faster Speculative Decoding | Shijing Hu et.al. | 2502.11018 | link |
| 2025-02-05 | QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache | Rishabh Tiwari et.al. | 2502.10424 | null |
| 2025-02-13 | Rosette Nebula Outburst Gaia 24djk from the Young Stellar Object V557 Mon | Adolfo S. Carvalho et.al. | 2502.09523 | null |
| 2025-02-13 | $^{18}$ F-FDG brain PET hypometabolism in post-SARS-CoV-2 infection: substrate for persistent/delayed disorders? | Eric Guedj et.al. | 2502.09077 | null |
| 2025-02-13 | CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality | Razvan-Gabriel Dumitru et.al. | 2502.08923 | link |
| 2025-03-19 | Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding | Ziyao Wang et.al. | 2502.08020 | null |
| 2025-04-13 | Regular Black Holes in Lovelock gravity with a Degenerate AdS Ground State and their shadows | Milko Estrada et.al. | 2502.07992 | null |
| 2025-03-06 | Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs | Ruichen Zhang et.al. | 2502.07942 | null |
| 2025-02-05 | Entropy Adaptive Decoding: Dynamic Model Switching for Efficient Inference | Toby Simonds et.al. | 2502.06833 | null |
| 2025-02-10 | Persistent spin grids with spin-orbit coupled 2D electron gas | A. V. Poshakinskiy et.al. | 2502.06745 | null |
| 2025-03-27 | LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models | Sihwan Park et.al. | 2502.06352 | link |
| 2025-02-10 | Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE | Haiduo Huang et.al. | 2502.06282 | link |
| 2025-02-08 | Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding | Sukmin Cho et.al. | 2502.05609 | link |
| 2025-01-31 | Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies | Nadav Timor et.al. | 2502.05202 | null |
| 2025-02-07 | Learning Universal Multi-level Market Irrationality Factors to Improve Stock Return Forecasting | Chen Yang et.al. | 2502.04737 | null |
| 2025-02-06 | Speeding up Speculative Decoding via Approximate Verification | Meiyu Zhong et.al. | 2502.04557 | null |
| 2025-02-06 | Gig2Gether: Data-sharing to Empower, Unify and Demystify Gig Work | Jane Hsieh et.al. | 2502.04482 | null |
| 2025-02-06 | The Evolution of Hypervelocity Supernova Survivors and the Outcomes of Interacting Double White Dwarf Binaries | Ken J. Shen et.al. | 2502.04451 | null |
| 2025-02-06 | Properties of the emission region in pulsars with opposite subpulse drift directions in different profile components | H. M. Tedila et.al. | 2502.03833 | null |
| 2025-02-05 | COSMOS-Web: The emergence of the Hubble Sequence | M. Huertas-Company et.al. | 2502.03532 | null |
| 2025-02-13 | FSLH: Flexible Mechanized Speculative Load Hardening | Roberto Blanco et.al. | 2502.03203 | null |
| 2025-02-05 | How probable is the Lyman- $α$ damping wing in the spectrum of the redshift z = 5.9896 quasar ULAS J0148+0600? | Fiona Sawyer et.al. | 2502.03085 | null |
| 2025-02-05 | A comprehensive study of the gas-phase formation network of HC $_5$ N: theory, experiments, observations and models | Lisa Giani et.al. | 2502.03046 | null |
| 2025-04-17 | The connection between high-redshift galaxies and Lyman $α$ transmission in the Sherwood-Relics simulations of patchy reionisation | Luke Conaboy et.al. | 2502.02983 | null |
| 2025-02-05 | Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation | Jingyu Liu et.al. | 2502.02789 | link |
| 2025-02-04 | EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization | Yize Wu et.al. | 2502.02493 | null |
| 2025-02-04 | M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference | Nikhil Bhendawade et.al. | 2502.02040 | null |
| 2025-02-03 | Cosmic Ray Feedback in Massive Halos: Implications for the Distribution of Baryons | Eliot Quataert et.al. | 2502.01753 | null |
| 2025-02-01 | Speculative Ensemble: Fast Large Language Model Ensemble via Speculation | Jiale Fu et.al. | 2502.01662 | link |
| 2025-02-03 | Time-dependent solutions of biadjoint scalar field theories | Kymani Armstrong-Williams et.al. | 2502.01294 | null |
| 2025-02-02 | Constructing AI ethics narratives based on real-world data: Human-AI collaboration in data-driven visual storytelling | Mengyi Wei et.al. | 2502.00637 | null |
| 2025-02-01 | Predicting the number density of heavy seed massive black holes due to an intense Lyman-Werner field | Hannah O’Brennan et.al. | 2502.00574 | null |
| 2025-02-04 | Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation | Yang Cao et.al. | 2502.00500 | null |
| 2025-02-14 | Reward-Guided Speculative Decoding for Efficient LLM Reasoning | Baohao Liao et.al. | 2501.19324 | null |
| 2025-01-31 | Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment | Gregor Bachmann et.al. | 2501.19309 | null |
| 2025-02-19 | Emancipatory Information Retrieval | Bhaskar Mitra et.al. | 2501.19241 | null |
| 2025-01-31 | Trading Inference-Time Compute for Adversarial Robustness | Wojciech Zaremba et.al. | 2501.18841 | null |
| 2025-01-30 | Human Re-ID Meets LVLMs: What can we expect? | Kailash Hambarde et.al. | 2501.18698 | null |
| 2025-01-28 | How Hamilton-Jacobi formalism helps to address the physical meaning of the wave function in Bohmian mechanics | Arnaud Amblard et.al. | 2501.16989 | null |
| 2025-03-04 | Distilling Large Language Models for Network Active Queue Management | Deol Satish et.al. | 2501.16734 | null |
| 2025-01-24 | The disrupting and growing open cluster spiral arm patterns of the Milky Way | Xiaochen Liu et.al. | 2501.14215 | null |
| 2025-01-19 | Hallucination Mitigation using Agentic AI Natural Language-Based Frameworks | Diego Gosmar et.al. | 2501.13946 | link |
| 2025-01-23 | Inflaton Self Resonance, Oscillons, and Gravitational Waves in Small Field Polynomial Inflation | Manuel Drees et.al. | 2501.13811 | null |
| 2025-01-23 | Considerations on the Origin of IRAS 19312+1950 Based on Long-Term Maser Observations | Huan-Xue Feng et.al. | 2501.13769 | null |
| 2025-01-23 | Compiler Support for Speculation in Decoupled Access/Execute Architectures | Robert Szafarczyk et.al. | 2501.13553 | null |
| 2025-02-01 | Concentration in Governance Control Across Decentralised Finance Protocols | Thomas Eisermann et.al. | 2501.13377 | link |
| 2025-01-22 | The outer structure of old star clusters in the Small Magellanic Cloud | Andrés E. Piatti et.al. | 2501.13062 | null |
| 2025-01-22 | Entanglement dynamics in collision models and entanglement quilts | Le Hu et.al. | 2501.12629 | null |
| 2025-01-22 | Link in $\mathbb{R}\mathbb{P}^3$ and the Topological Vertex | John Chae et.al. | 2501.12566 | null |
| 2025-01-21 | AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding | Zikun Li et.al. | 2501.12162 | null |
| 2025-01-20 | MIDIS: Quantifying the AGN component of X-ray-detected galaxies | Steven Gillman et.al. | 2501.11491 | null |
| 2025-01-23 | The JWST EXCELS survey: an extremely metal-poor galaxy at $z=8.271$ hosting an unusual population of massive stars | F. Cullen et.al. | 2501.11099 | null |
| 2025-01-30 | Vortices for lake equations (review with questions and speculations) | Jair Koiller et.al. | 2501.10433 | null |
| 2025-01-17 | From strong to weak correlations in breathing-mode kagome van der Waals materials: Nb $_3$(F,Cl,Br,I)$_8$ as a robust and versatile platform for many-body engineering | Joost Aretz et.al. | 2501.10320 | null |
| 2025-01-16 | 25 years of XMM-Newton observations of the Sgr A complex: 3D distribution and internal structure of the clouds | G. Stel et.al. | 2501.09737 | null |
| 2025-01-16 | Weak electronic correlations in the cobalt oxychalcogenide superconductor Na2CoSe2O | Zhenchao Wu et.al. | 2501.09675 | null |
| 2025-02-11 | Anatomy of a Digital Bubble: Lessons Learned from the NFT and Metaverse Frenzy | Daisuke Kawai et.al. | 2501.09601 | null |
| 2025-01-16 | A universal break in energy functions of three hyperactive repeating fast radio bursts | Q. Wu et.al. | 2501.09248 | null |
| 2025-01-15 | The emission of interpulses by a 6.45-hour period coherent radio transient | Y. W. J. Lee et.al. | 2501.09133 | null |
| 2025-01-13 | Cassiopeia A’s Reverse Shock and its Effects on the Expanding SN Ejecta | Robert A. Fesen et.al. | 2501.07708 | null |
| 2025-01-11 | Is the Monetary Transmission Mechanism Broken? Time for People’s Quantitative Easing | Sebastian Dragoe et.al. | 2501.06575 | null |
| 2025-01-27 | QPEs as Lense-Thirring precession of super-Eddington flows | M. Middleton et.al. | 2501.06185 | link |
| 2025-01-10 | Analysing the coverage of the University of Bologna’s publication metadata in an existing source of open research information | Erica Andreose et.al. | 2501.05821 | null |
| 2025-01-09 | Accelerated Diffusion Models via Speculative Sampling | Valentin De Bortoli et.al. | 2501.05370 | null |
| 2025-01-09 | The CO-Fuelled Time Machine: Tracing Birth Conditions and Terrestrial Planet Formation Outcomes in HD 163296 through Pebble Drift-induced CO Enhancements | Joe Williams et.al. | 2501.05316 | null |
| 2025-01-09 | Observational Study of the Atmospheric Gravity Waves in the lower Solar Atmosphere | Ravi Chaurasiya et.al. | 2501.05042 | null |
| 2025-01-07 | Transparent Decompilation for Timing Side-Channel Analyses | Santiago Arranz Olmos et.al. | 2501.04183 | null |
| 2025-01-07 | Spin Environment of a Superconducting Qubit in High Magnetic Fields | S. Günzler et.al. | 2501.03661 | null |
| 2025-01-07 | Neural Cellular Automata and Deep Equilibrium Models | Zhibai Jia et.al. | 2501.03573 | null |
| 2025-01-07 | CI at Scale: Lean, Green, and Fast | Dhruva Juloori et.al. | 2501.03440 | null |
| 2025-01-02 | Vertex algebras, topological defects, and Moonshine | Roberto Volpato et.al. | 2412.21141 | null |
| 2024-12-30 | Strategic Learning and Trading in Broker-Mediated Markets | Alif Aqsha et.al. | 2412.20847 | null |
| 2024-12-28 | From Worms to Mice: Homeostasis Maybe All You Need | Jesus Marco de Lucas et.al. | 2412.20090 | null |
| 2025-01-13 | HADES: Hardware Accelerated Decoding for Efficient Speculation in Large Language Models | Ze Yang et.al. | 2412.19925 | null |
| 2024-12-27 | Cosmohedra | Nima Arkani-Hamed et.al. | 2412.19881 | null |
| 2024-12-27 | Paleoinspired Vision: From Exploring Colour Vision Evolution to Inspiring Camera Design | Junjie Zhang et.al. | 2412.19439 | null |
| 2024-12-25 | Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference | Libo Zhang et.al. | 2412.18934 | null |
| 2024-12-25 | AdaEAGLE: Optimizing Speculative Decoding via Explicit Modeling of Adaptive Draft Structures | Situo Zhang et.al. | 2412.18910 | null |
| 2024-12-23 | The Unique Helium Nova V445 Puppis Ejected $\gg$0.001 M$_{\odot}$ in the Year 2000 and Will Not Become a Type Ia Supernova | Bradley E. Schaefer et.al. | 2412.17286 | null |
| 2024-12-20 | Gravitational Observatories in AdS $_4$ | Dionysios Anninos et.al. | 2412.16305 | null |
| 2024-12-20 | Two-Part Interplanetary Type II Solar Radio Bursts | Silja Pohjolainen et.al. | 2412.15961 | null |
| 2025-01-10 | Minimizing speculation overhead in a parallel recognizer for regular texts | Angelo Borsotti et.al. | 2412.14975 | null |
| 2025-01-13 | $\mathcal{N}=2$ superconformal gravitino in harmonic superspace | Evgeny Ivanov et.al. | 2412.14822 | null |
| 2025-02-07 | The JWST/NIRSpec view of the nuclear region in the prototypical merging galaxy NGC 6240 | Matteo Ceci et.al. | 2412.14685 | null |
| 2024-12-18 | Fermion-Portal Dark Matter at a High-Energy Muon Collider | Pouya Asadi et.al. | 2412.14235 | null |
| 2024-12-18 | Current and secular accretion rates of EX Hydrae | K. Beuermann et.al. | 2412.13850 | null |
| 2024-12-18 | Fool’s gold: ligand-receptor interactions and the origins of life | Betony Adams et.al. | 2412.13836 | null |
| 2024-12-18 | Diffusion models and stochastic quantisation in lattice field theory | Gert Aarts et.al. | 2412.13704 | null |
| 2024-12-17 | Distributed Speculative Execution for Resilient Cloud Applications | Tianyu Li et.al. | 2412.13314 | null |
| 2024-12-17 | Where do X-ray low surface brightness clusters sit with respect to filaments? | S. Zarattini et.al. | 2412.13258 | null |
| 2024-12-17 | Agnosticism About Artificial Consciousness | Tom McClelland et.al. | 2412.13145 | null |
| 2024-12-17 | Insight into the Starburst Nature of Galaxy GN-z11 with JWST MIRI Spectroscopy | J. Álvarez-Márquez et.al. | 2412.12826 | null |
| 2025-03-18 | Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models | Seungeun Oh et.al. | 2412.12687 | null |
| 2024-12-26 | Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree | Xiangxiang Gao et.al. | 2412.12639 | null |
| 2024-12-15 | Heat kernel and local index theorem for open complex manifolds with $\mathbb{C}^{\ast }$ -action | Jih-Hsin Cheng et.al. | 2412.11037 | null |
| 2024-12-14 | The JWST-NIRCam View of Sagittarius C. II. Evidence for Magnetically Dominated HII Regions in the CMZ | John Bally et.al. | 2412.10983 | null |
| 2025-02-23 | Interference in Fuzzy Dark Matter Filaments: Idealised Models and Statistics | Tim Zimmermann et.al. | 2412.10829 | null |
| 2025-02-10 | Constrained Decoding with Speculative Lookaheads | Nishanth Nakshatri et.al. | 2412.10418 | null |
| 2025-01-15 | Asymmetric Temperature Variations In Protoplanetary disks: I. Linear Theory, Corotating Spirals, and Ring Formation | Zhaohuan Zhu et.al. | 2412.09571 | null |
| 2024-12-12 | AI Predicts AGI: Leveraging AGI Forecasting and Peer Review to Explore LLMs’ Complex Reasoning Capabilities | Fabrizio Davide et.al. | 2412.09385 | null |
| 2024-12-11 | Can transformative AI shape a new age for our civilization?: Navigating between speculation and reality | Jesus L. Lobo et.al. | 2412.08273 | null |
| 2024-12-10 | Mapping the spatial extent of HI-rich absorbers using MgII absorption along gravitational arcs | Trystyn A. M. Berg et.al. | 2412.07652 | null |
| 2024-12-26 | CoinCLIP: A Multimodal Framework for Assessing Viability in Web3 Memecoins | Hou-Wan Long et.al. | 2412.07591 | null |
| 2024-12-10 | Modeling Speculative Trading Patterns in Token Markets: An Agent-Based Analysis with TokenLab | Mengjue Wang et.al. | 2412.07512 | null |
| 2024-12-10 | KPZ-like scaling on a high-dimensional hypersphere | Daniil Fedotov et.al. | 2412.07432 | null |
| 2024-12-10 | Exploring types I and IIA effective actions through T-duality | Mohammad R. Garousi et.al. | 2412.07234 | null |
| 2024-12-10 | Relativistic Mott transition in strongly correlated artificial graphene | Liguo Ma et.al. | 2412.07150 | null |
| 2024-12-10 | Gravitational focusing and horizon entropy for higher-spin fields | Zihan Yan et.al. | 2412.07107 | null |
| 2024-12-09 | Inelastic H + H $^+_3$ Collision rates and their impact in the determination of the excitation temperature of H$^+_3$ | Daniel Felix-Gonzalez et.al. | 2412.06697 | null |
| 2024-12-09 | Systematic comparison of deep generative models applied to multivariate financial time series | Howard Caulfield et.al. | 2412.06417 | null |
| 2024-12-09 | Beyond pip install: Evaluating LLM Agents for the Automated Installation of Python Projects | Louis Milliken et.al. | 2412.06294 | link |
| 2024-12-06 | Revisiting the hallmark freezing and melting points in colloidal dispersions and the search for the elusive coexistence region | J. Galen Wang et.al. | 2412.05422 | null |
| 2024-12-06 | Penetrative rotating magnetoconvection subject to lateral variations in temperature gradients | Tirtharaj Barman et.al. | 2412.05235 | null |
| 2024-12-06 | Predictive Window Decoding for Fault-Tolerant Quantum Programs | Joshua Viszlai et.al. | 2412.05115 | null |
| 2024-12-04 | Successive magnetic transitions in the spin-5/2 easy-axis triangular-lattice antiferromagnet Na $_2$BaMn(PO$_4$)$_2$ : A neutron diffraction study | Chuandi Zhang et.al. | 2412.03149 | null |
| 2025-01-02 | The Reality of AI and Biorisk | Aidan Peppin et.al. | 2412.01946 | null |
| 2024-12-02 | PLD+: Accelerating LLM inference by leveraging Language Model Artifacts | Shwetha Somasundaram et.al. | 2412.01447 | null |
| 2024-12-02 | Enhanced solid solution hardening by off-center substitutional solute atoms in α-Ti | Zi-Han Yu et.al. | 2412.01298 | null |
| 2024-11-25 | Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration | Zhuofan Wen et.al. | 2412.00061 | null |
| 2024-11-12 | The Copernican Argument for Alien Consciousness; The Mimicry Argument Against Robot Consciousness | Eric Schwitzgebel et.al. | 2412.00008 | null |
| 2024-11-28 | Night-Side Relativistic Electron Precipitation Bursts in the Outer Radiation Belt: Insights from ELFIN and THEMIS | Xi Lu et.al. | 2411.19232 | null |
| 2024-11-27 | Magnetic field tuned superconducting and normal phase magnetism in CeCo ${0.5}$Rh${0.5}$In$_{5}$ | A. Howell et.al. | 2411.18540 | null |
| 2024-11-27 | Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding | Ziyin Zhang et.al. | 2411.18462 | link |
| 2024-11-27 | 6G Takes Shape | Jeffrey G. Andrews et.al. | 2411.18435 | null |
| 2024-11-27 | An evolution of matrix-valued orthogonal polynomials | Erik Koelink et.al. | 2411.18362 | null |
| 2024-11-27 | Comprehensive Kernel Safety in the Spectre Era: Mitigations and Performance Evaluation (Extended Version) | Davide Davoli et.al. | 2411.18094 | null |
| 2024-12-25 | Stellar evolution along the AGB as revealed by the shape of Miras’ visual light curves | D. T. Hoai et.al. | 2411.18044 | null |
| 2024-11-26 | Stable curves and chromatic polynomials | Bernhard Reinke et.al. | 2411.17551 | null |
| 2024-12-08 | A revamped understanding of Cosmic Rays and Gamma-Ray Bursts | A. De Rújula et.al. | 2411.15850 | null |
| 2024-11-20 | The Impossible Test: A 2024 Unsolvable Dataset and A Chance for an AGI Quiz | David Noever et.al. | 2411.14486 | null |
| 2024-12-03 | Mediating Modes of Thought: LLM’s for design scripting | Moritz Rietschel et.al. | 2411.14485 | null |
| 2024-11-21 | THz optical response of Ba(Fe ${1-x}$Ni$_x$)$_2$As$_2$ films analyzed within the three-band Eliashberg s$\pm$ -wave model | Yurii A. Aleshchenko et.al. | 2411.14011 | null |
| 2024-11-27 | Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding | Hyun Ryu et.al. | 2411.13157 | null |
| 2024-11-20 | Far-field Boundary Conditions for Airfoil Simulation at High Incidence in Steady, Incompressible, Two-dimensional Flow | Narges Golmirzaee et.al. | 2411.13077 | null |
| 2024-11-19 | Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing | Ruyi Ding et.al. | 2411.12508 | null |
| 2025-09-30 | Continuous Speculative Decoding for Autoregressive Image Generation | Zili Wang et.al. | 2411.11925 | null |
| 2024-12-26 | Teapot: Efficiently Uncovering Spectre Gadgets in COTS Binaries | Fangzheng Lin et.al. | 2411.11624 | null |
| 2024-11-30 | Diversity of disc viscosities can explain the period ratios of resonant and non-resonant systems of hot super-Earths and mini-Neptunes | Bertram Bitsch et.al. | 2411.11452 | null |
| 2024-11-25 | First memoir on the asymptotics of certain infinite products | Wadim Zudilin et.al. | 2411.11100 | null |
| 2024-11-17 | FastDraft: How to Train Your Draft | Ofir Zafrir et.al. | 2411.11055 | null |
| 2024-12-16 | SAM Decoding: Speculative Decoding via Suffix Automaton | Yuxuan Hu et.al. | 2411.10666 | link |
| 2024-11-15 | Moving Forward: A Review of Autonomous Driving Software and Hardware Systems | Xu Wang et.al. | 2411.10291 | null |
| 2024-11-14 | Cosmic inflation in an extended non-commutative foliated quantum gravity: the wave function of the universe | César A. Zen Vasconcellos et.al. | 2411.09756 | null |
| 2024-11-15 | Provocation: Who benefits from “inclusion” in Generative AI? | Samantha Dalal et.al. | 2411.09102 | null |
| 2024-11-13 | Thought Experiments in Design Fiction for Visualization | Swaroop Panda et.al. | 2411.08621 | null |
| 2025-01-01 | A Geometric Substructure for Quantum Dynamics | Anthony John Bracken et.al. | 2411.08230 | null |
| 2025-01-11 | The Grass of the Universe: Rethinking Technosphere, Planetary History, and Sustainability with Fermi Paradox | Lukáš Likavčan et.al. | 2411.08057 | null |
| 2024-11-12 | A rich structure of renormalization group flows for Higgs-like models in 4 dimensions | André LeClair et.al. | 2411.07476 | null |
| 2024-11-12 | Input-Based Ensemble-Learning Method for Dynamic Memory Configuration of Serverless Computing Functions | Siddharth Agarwal et.al. | 2411.07444 | null |
| 2024-11-11 | The Inherent Adversarial Robustness of Analog In-Memory Computing | Corey Lammie et.al. | 2411.07023 | null |
| 2024-11-10 | Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents | Yu Gu et.al. | 2411.06559 | link |
| 2024-11-10 | MOCCA-III: Effects of pristine gas accretion and cluster migration on globular cluster evolution, global parameters and multiple stellar populations | Mirek Giersz et.al. | 2411.06421 | null |
| 2024-11-10 | Generating Mixcode Popular Songs with Artificial Intelligence: Concepts, Plans, and Speculations | Abhishek Kaushik et.al. | 2411.06420 | null |
| 2024-11-08 | SSSD: Simply-Scalable Speculative Decoding | Michele Marzollo et.al. | 2411.05894 | null |
| 2024-11-08 | SpecHub: Provable Acceleration to Multi-Draft Speculative Decoding | Ryan Sun et.al. | 2411.05289 | link |
| 2024-11-07 | SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference | Gabriele Oliaro et.al. | 2411.04975 | null |
| 2024-11-06 | The N-Grammys: Accelerating Autoregressive Inference with Learning-Free Batched Speculation | Lawrence Stewart et.al. | 2411.03786 | null |
| 2024-11-05 | Remarkable Scale Relation, Approximate SU(5), Fluctuating Lattice | Holger Bech Nielsen et.al. | 2411.03552 | null |
| 2024-11-05 | Shared Memory-Aware Latency-Sensitive Message Aggregation for Fine-Grained Communication | Kavitha Chandrasekar et.al. | 2411.03533 | null |
| 2024-11-07 | A high resolution simulation of protoplanetary disk turbulence driven by the vertical shear instability | Karim Shariff et.al. | 2411.03467 | null |
| 2024-11-04 | PipeLLM: Fast and Confidential Large Language Model Services with Speculative Pipelined Encryption | Yifan Tan et.al. | 2411.03357 | null |
| 2024-11-05 | On the possible core shift break in relativistic jets | E. E. Nokhrina et.al. | 2411.02925 | null |
| 2024-11-04 | A proof of self-organized criticality in a sandpile | Christopher Hoffman et.al. | 2411.02541 | null |
| 2025-02-07 | Pseudo Transitions in the Finite-Size Blume-Capel Model | Lei Shi et.al. | 2411.01743 | null |
| 2024-11-05 | Privacy Risks of Speculative Decoding in Large Language Models | Jiankun Wei et.al. | 2411.01076 | null |
| 2024-10-30 | Accelerated AI Inference via Dynamic Execution Methods | Haim Barad et.al. | 2411.00853 | null |
| 2024-11-05 | A Theoretical Perspective for Speculative Decoding Algorithm | Ming Yin et.al. | 2411.00841 | null |
| 2024-10-31 | Interpretable Language Modeling via Induction-head Ngram Models | Eunji Kim et.al. | 2411.00066 | link |
| 2024-10-31 | ALISE: Accelerating Large Language Model Serving with Speculative Scheduling | Youpeng Zhao et.al. | 2410.23537 | null |
| 2024-10-30 | Flavor Patterns of Fundamental Particles from Quantum Entanglement? | Jesse Thaler et.al. | 2410.23343 | null |
| 2024-10-29 | Lost and Found in Speculation: Hybrid Speculative Vulnerability Detection | Mohamadreza Rostami et.al. | 2410.22555 | null |
| 2025-02-10 | Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding | Bohan Li et.al. | 2410.21951 | null |
| 2024-10-29 | Rapid cooling of the Cassiopeia A neutron star due to superfluid quantum criticality | Hao-Fu Zhu et.al. | 2410.21945 | null |
| 2024-10-28 | Model-agnostic basis functions for the 2-point correlation function of dark matter in linear theory | Aseem Paranjape et.al. | 2410.21374 | link |
| 2024-10-11 | The Social Impact of Generative LLM-Based AI | Yu Xie et.al. | 2410.21281 | null |
| 2024-10-28 | On the limits of informationally efficient stock markets: New insights from a chartist-fundamentalist model | Laura Gardini et.al. | 2410.21198 | null |
| 2024-10-27 | A Jet-Induced Shock in a Young, Powerful Radio Galaxy at z=3.00 | Nick Seymour et.al. | 2410.20609 | null |
| 2024-10-27 | FIRP: Faster LLM inference via future intermediate representation prediction | Pengfei Wu et.al. | 2410.20488 | null |
| 2024-10-27 | Inevitable Trade-off between Watermark Strength and Speculative Sampling Efficiency for Language Models | Zhengmian Hu et.al. | 2410.20418 | null |
| 2024-10-31 | Fast Best-of-N Decoding via Speculative Rejection | Hanshi Sun et.al. | 2410.20290 | link |
| 2024-10-24 | Intention Is All You Need | Advait Sarkar et.al. | 2410.18851 | null |
| 2024-10-24 | AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability | Sudhanshu Agrawal et.al. | 2410.18351 | null |
| 2024-10-23 | Multi-Draft Speculative Sampling: Canonical Architectures and Theoretical Limits | Ashish Khisti et.al. | 2410.18234 | null |
| 2025-02-10 | Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition | Artem Basharin et.al. | 2410.17765 | null |
| 2024-10-22 | AMUSD: Asynchronous Multi-Device Speculative Decoding for LLM Acceleration | Bradley McDanel et.al. | 2410.17375 | link |
| 2024-10-22 | Remote Timing Attacks on Efficient Language Model Inference | Nicholas Carlini et.al. | 2410.17175 | null |
| 2024-10-23 | Quantum many-body scars as remnants of stable many-body periodic orbits | Keita Omiya et.al. | 2410.16916 | null |
| 2024-10-22 | Chiral polaritonics: cavity-mediated enantioselective excitation condensation | Rosario R. Riso et.al. | 2410.16861 | null |
| 2024-10-22 | An Extreme Radio Fluctuation of Pulsar B1929 $+$ 10 | Zhengli Wang et.al. | 2410.16816 | null |
| 2024-10-21 | Galaxy Size and Mass Build-up in the First 2 Gyrs of Cosmic History from Multi-Wavelength JWST NIRCam Imaging | Natalie Allen et.al. | 2410.16354 | null |
| 2024-10-30 | TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling | Jiahao Qiu et.al. | 2410.16033 | null |
| 2024-10-21 | Efficient and Universally Accessible Cross-Chain Options without Upfront Holder Collateral | Zifan Peng et.al. | 2410.15724 | null |
| 2024-10-21 | Investigating Unusual H $α$ Features towards the Scutum Supershell | R. Alsulami et.al. | 2410.15712 | null |
| 2024-10-17 | Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding | Tan Dat Nguyen et.al. | 2410.13839 | null |
| 2024-10-17 | Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions | Michael J. Q. Zhang et.al. | 2410.13788 | null |
| 2024-10-17 | Looking Inward: Language Models Can Learn About Themselves by Introspection | Felix J Binder et.al. | 2410.13787 | link |
| 2024-10-17 | PGC 44685: A Dwarf Star-forming Lenticular Galaxy with Wolf-Rayet Population | Shiying Lu et.al. | 2410.13119 | null |
| 2024-10-16 | Gravitational instantons and the quality problem of the QCD axion: Facts, speculations, and statements in between | Pier Giuseppe Catinari et.al. | 2410.12741 | null |
| 2024-10-15 | Evolution of Ferromagnetism and Electrical Resistivity in Sb-Doped Cr4PtGa17 | Chaoguo Wang et.al. | 2410.12078 | null |
| 2024-10-15 | MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation | Chenxi Wang et.al. | 2410.11779 | link |
| 2024-10-15 | DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure | Yunfan Xiong et.al. | 2410.11744 | null |
| 2024-10-15 | Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling | Wenda Xu et.al. | 2410.11325 | null |
| 2025-02-01 | QSpec: Speculative Decoding with Complementary Quantization Schemes | Juntao Zhao et.al. | 2410.11305 | null |
| 2024-11-20 | Unveiling dust, molecular gas, and high star formation efficiency in extremely UV bright star-forming galaxies at $z\sim 2.1-3.6$ | M. Dessauges-Zavadsky et.al. | 2410.11121 | null |
| 2024-10-01 | Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models | Keivan Alizadeh et.al. | 2410.10846 | null |
| 2024-10-15 | The Discovery of Polarized Water Vapor Megamaser Emission in a Molecular Accretion Disk | Jack F. Gallimore et.al. | 2410.10569 | null |
| 2024-10-14 | Temperature-Centric Investigation of Speculative Decoding with Knowledge Distillation | Siru Ouyang et.al. | 2410.10141 | null |
| 2024-11-12 | Self-Data Distillation for Recovering Quality in Pruned Large Language Models | Vithursan Thangarasa et.al. | 2410.09982 | null |
| 2024-10-13 | Super-Bandgap Electroluminescence from Cesium Lead Bromide | Justin Sculley et.al. | 2410.09702 | null |
| 2024-10-21 | On Two Nucleons Near Unitarity with Perturbative Pions | Yu Ping Teng et.al. | 2410.09653 | null |
| 2024-10-11 | Compact [OIII] emission-line regions (“Green Seeds”) in $\mathrm{Hα}$ emitters at Cosmic Noon from JWST Observations | Nuo Chen et.al. | 2410.08520 | null |
| 2024-10-09 | SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration | Heming Xia et.al. | 2410.06916 | link |
| 2025-02-06 | Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level | Xinyi Zeng et.al. | 2410.06809 | null |
| 2024-10-08 | ParallelSpec: Parallel Drafter for Efficient Speculative Decoding | Zilin Xiao et.al. | 2410.05589 | null |
| 2024-10-09 | Density estimation with LLMs: a geometric investigation of in-context learning trajectories | Toni J. B. Liu et.al. | 2410.05218 | null |
| 2024-10-08 | Efficient Inference for Large Language Model-based Generative Recommendation | Xinyu Lin et.al. | 2410.05165 | null |
| 2024-10-04 | Density functional theory based investigation of heavy fermion band candidates in triplet superconductor UTe2 | Shouzheng Liu et.al. | 2410.03840 | null |
| 2024-10-04 | Mixture of Attentions For Speculative Decoding | Matthieu Zimmer et.al. | 2410.03804 | null |
| 2024-10-03 | AI-rays: Exploring Bias in the Gaze of AI Through a Multimodal Interactive Installation | Ziyao Gao et.al. | 2410.03786 | null |
| 2024-09-24 | Nonmetric geometric flows and quasicrystalline topological phases for dark energy and dark matter in $f(Q)$ cosmology | L. Bubuianu et.al. | 2410.03700 | null |
| 2025-01-31 | LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding | Doohyuk Jang et.al. | 2410.03355 | null |
| 2024-10-04 | Generative Edge Detection with Stable Diffusion | Caixia Zhou et.al. | 2410.03080 | null |
| 2024-10-03 | Inductive Generative Recommendation via Retrieval-based Speculation | Yijie Ding et.al. | 2410.02939 | link |
| 2024-10-03 | The Stellar Initial Mass Function of Early Dark Matter-free Gas Objects | William Lake et.al. | 2410.02868 | null |
| 2024-10-03 | Atoms near a conducting wedge: decay rates and entanglement around a corner | Romuald Kilianski et.al. | 2410.02349 | null |
| 2024-10-02 | Time Variation of the Solar Tachocline | Sarbani Basu et.al. | 2410.01895 | null |
| 2024-12-25 | Interpretable Contrastive Monte Carlo Tree Search Reasoning | Zitian Gao et.al. | 2410.01707 | link |
| 2024-10-02 | Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding | Yao Teng et.al. | 2410.01699 | link |
| 2024-12-09 | Forte : Finding Outliers with Representation Typicality Estimation | Debargha Ganguly et.al. | 2410.01322 | link |
| 2024-10-02 | Speculative Coreset Selection for Task-Specific Fine-tuning | Xiaoyu Zhang et.al. | 2410.01296 | null |
| 2024-10-01 | Draft on the Fly: Adaptive Self-Speculative Decoding using Cosine Similarity | Michael R. Metel et.al. | 2410.01028 | null |
| 2024-10-01 | A Scheduling-Aware Defense Against Prefetching-Based Side-Channel Attacks | Till Schlüter et.al. | 2410.00452 | null |
| 2024-11-12 | Galactic center G objects as dust-enshrouded stars near the supermassive black hole | Michal Zajaček et.al. | 2410.00304 | null |
| 2024-09-30 | Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface | Wenyue Hua et.al. | 2410.00079 | null |
| 2024-09-30 | Statistical view of orbital circularisation with 14 000 characterised TESS eclipsing binaries | L. W. IJspeert et.al. | 2409.20540 | null |
| 2024-09-30 | New HI observations Toward the NGC 5055 Galaxy Group with FAST | Xiao-Lan Liu et.al. | 2409.20109 | null |
| 2024-09-27 | Thermal Conductivity of Cubic Silicon Carbide Single Crystals Heavily Doped by Nitrogen | Zifeng Huang et.al. | 2409.18843 | null |
| 2024-09-27 | SpecCFA: Enhancing Control Flow Attestation/Auditing via Application-Aware Sub-Path Speculation | Adam Caulfield et.al. | 2409.18403 | null |
| 2025-03-17 | Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference | Zongyue Qin et.al. | 2409.16560 | null |
| 2024-09-22 | ALMASOP. The Localized and Chemically rich Features near the Bases of the Protostellar Jet in HOPS 87 | Shih-Ying Hsu et.al. | 2409.14445 | null |
| 2024-09-21 | Triangulating on Possible Futures: Conducting User Studies on Several Futures Instead of Only One | Antti Salovaara et.al. | 2409.14137 | null |
| 2024-09-29 | String Invention, Viable 3-3-1 Model, Dark Matter Black Holes | Holger B. Nielsen et.al. | 2409.13776 | null |
| 2024-09-20 | Interstellar Glycolaldehyde, Methyl Formate, and Acetic Acid. II. Chemical Modeling of the Bimodal Abundance Pattern in NGC 6334I | Brielle M. Shope et.al. | 2409.13673 | null |
| 2024-09-20 | A Comparison between Financial and Gambling Markets | Haoyu Liu et.al. | 2409.13528 | null |
| 2024-12-12 | Consequences of Minimal Entanglement in Bosonic Field Theories | Spencer Chang et.al. | 2409.13030 | null |
| 2024-09-17 | UNCOVER: Significant Reddening in Cosmic Noon Quiescent Galaxies | Jared Siegel et.al. | 2409.11457 | null |
| 2024-09-17 | The ALMA-CRISTAL Survey: Spatially-resolved Star Formation Activity and Dust Content in 4 < z < 6 Star-forming Galaxies | Juno Li et.al. | 2409.10961 | null |
| 2024-12-14 | Improving Multi-candidate Speculative Decoding | Xiaofan Lu et.al. | 2409.10644 | link |
| 2024-09-16 | Aggregation-diffusion in heterogeneous environments | Jonathan R. Potts et.al. | 2409.10147 | link |
| 2024-12-12 | Pure Lovelock Gravity regular black holes | Milko Estrada et.al. | 2409.09559 | null |
| 2024-09-14 | Ground State Phase Diagram of $\text{SU}(3)$ $t$-$J$ Chain | Junhao Zhang et.al. | 2409.09344 | null |
| 2024-12-02 | Two-Time Relativistic Bohmian Model of Quantum Mechanics | Giuseppe Raguní et.al. | 2409.09049 | null |
| 2024-09-13 | Dynamic Simultaneous Multithreaded Architecture | Daniel Ortiz-Arroyo et.al. | 2409.07903 | null |
| 2024-09-09 | DNA sequence alignment: An assignment for OpenMP, MPI, and CUDA/OpenCL | Arturo Gonzalez-Escribano et.al. | 2409.06075 | null |
| 2024-10-05 | Predicting Foreign Exchange EUR/USD direction using machine learning | Kevin Cedric Guyard et.al. | 2409.04471 | null |
| 2024-09-05 | Evidence for Dust Depletion in a Misaligned Protoplanetary Disk with JWST | C. C. Espaillat et.al. | 2409.03702 | null |
| 2024-09-04 | Cavitating bubbles in condensing gas as a means of forming clumps, chondrites, and planetesimals | Eugene Chiang et.al. | 2409.02978 | null |
| 2024-09-03 | Light-Ray Wave Functions and Integrability | Alexandre Homrich et.al. | 2409.02160 | null |
| 2024-09-03 | Foreactor: Exploiting Storage I/O Parallelism with Explicit Speculation | Guanzhou Hu et.al. | 2409.01580 | null |
| 2024-09-02 | A Comprehensive Analysis of the Future of Atomically Precise Manufacturing | Vadym Shvydun et.al. | 2409.00955 | null |
| 2024-08-30 | Dynamic Depth Decoding: Faster Speculative Decoding for LLMs | Oscar Brown et.al. | 2409.00142 | null |
| 2024-08-29 | LightSLH: Provable and Low-Overhead Spectre v1 Mitigation through Targeted Instruction Hardening | Yiming Zhu et.al. | 2408.16220 | null |
| 2024-08-28 | An Empirical Study of API Misuses of Data-Centric Libraries | Akalanka Galappaththi et.al. | 2408.15853 | null |
| 2024-08-28 | Indirect nonlinear interaction between toroidal Alfvén eigenmode and ion temperature gradient mode mediated by zonal structures | Qian Fang et.al. | 2408.15782 | null |
| 2025-02-27 | Learning Harmonized Representations for Speculative Sampling | Lefan Zhang et.al. | 2408.15766 | null |
| 2024-08-29 | Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation | Lujun Gui et.al. | 2408.15562 | null |
| 2024-11-18 | The companion mass distribution of post common envelope hot subdwarf binaries: evidence for boosted and disrupted magnetic braking? | Lisa Blomberg et.al. | 2408.15334 | null |
| 2024-08-27 | The Way To Circumbinary Planets | Hans J Deeg et.al. | 2408.15307 | null |
| 2024-12-26 | The Mamba in the Llama: Distilling and Accelerating Hybrid Models | Junxiong Wang et.al. | 2408.15237 | link |
| 2024-08-26 | SO as shock tracer in protoplanetary disks: the AB Aurigae case | A. Dutrey et.al. | 2408.14276 | null |
| 2024-08-25 | The origins of noise in the Zeeman splitting of spin qubits in natural-silicon devices | Juan S. Rojas-Arias et.al. | 2408.13707 | null |
| 2024-07-22 | Simopt – Simulation pass for Speculative Optimisation of FPGA-CAD flow | Eashan Wadhwa et.al. | 2408.12676 | null |
| 2024-12-19 | Exposing Shadow Branches | Chrysanthos Pepi et.al. | 2408.12592 | null |
| 2024-08-22 | Enhancing Causal Discovery in Financial Networks with Piecewise Quantile Regression | Cameron Cornell et.al. | 2408.12210 | null |
| 2024-08-21 | Electrostatic Origins of the Dirichlet Principle | Steven Deckelman et.al. | 2408.12002 | null |
| 2024-09-04 | Parallel Speculative Decoding with Adaptive Draft Length | Tianyu Liu et.al. | 2408.11850 | link |
| 2024-08-21 | Chemical models of interstellar glycine and adenine precursor aminoacetonitrile (NH2CH2CN) | Xia Zhang et.al. | 2408.11776 | null |
| 2024-08-20 | High detection significance of the dark substructure in gravitational lens SDSSJ0946+1006 is revealed by image pixel supersampling | Quinn E. Minor et.al. | 2408.11090 | null |
| 2024-08-23 | MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding | Jian Chen et.al. | 2408.11049 | link |
| 2024-08-20 | Revisiting the measurements and interpretations of DLVO forces | Bo Feng et.al. | 2408.10870 | null |
| 2024-08-19 | Constraining the Generalized Tolman-Oppenheimer-Volkoff (GTOV) equation with Bayesian analysis | Franciele M. da Silva et.al. | 2408.10425 | null |
| 2024-08-18 | A new measure of risk using Fourier analysis | Michael Grabinski et.al. | 2408.10279 | null |
| 2024-08-19 | Excitonic-trion population in two-dimensional halide perovskites | Efstratios Manousakis et.al. | 2408.10097 | null |
| 2024-08-16 | Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling | Xianzhen Luo et.al. | 2408.08696 | null |
| 2024-08-15 | KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning | Kaiqi Zhang et.al. | 2408.08146 | null |
| 2024-08-19 | Coupling without Communication and Drafter-Invariant Speculative Decoding | Majid Daliri et.al. | 2408.07978 | link |
| 2024-12-06 | The Small Sizes and High Implied Densities of `Little Red Dots’ with Balmer Breaks Could Explain Their Broad Emission Lines Without an AGN | Josephine F. W. Baggen et.al. | 2408.07745 | null |
| 2024-08-14 | Only One Relation Possible? Modeling the Ambiguity in Event Temporal Relation Extraction | Yutong Hu et.al. | 2408.07353 | null |
| 2024-07-23 | Stablecoin Runs and Disclosure Policy in the Presence of Large Sales | Brian Zhu et.al. | 2408.07227 | null |
| 2024-08-13 | Speculations on Uncertainty and Humane Algorithms | Nicholas Gray et.al. | 2408.06736 | null |
| 2024-08-15 | Inefficiencies of Carbon Trading Markets | Nicola Borri et.al. | 2408.06497 | null |
| 2024-08-12 | Correct Wrong Path | Bhargav Reddy Godala et.al. | 2408.05912 | null |
| 2024-08-11 | A Decoding Acceleration Framework for Industrial Deployable LLM-based Recommender Systems | Yunjia Xi et.al. | 2408.05676 | link |
| 2024-08-16 | Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion | Jacob K Christopher et.al. | 2408.05636 | null |
| 2024-08-09 | Recurrent Stochastic Fluctuations with Financial Speculation | Tomohiro Hirano et.al. | 2408.05047 | null |
| 2024-08-08 | HotStuff-1: Linear Consensus with One-Phase Speculation | Dakai Kang et.al. | 2408.04728 | null |
| 2024-08-08 | CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding | Sophia Ho et.al. | 2408.04678 | null |
| 2024-08-08 | Black hole mass and optical radiation mechanism of the tidal disruption event AT 2023clx | Shiyan Zhong et.al. | 2408.04448 | null |
| 2024-08-05 | Rich dynamical behaviors from a digital reversal operation | Yannis Almirantis et.al. | 2408.02527 | null |
| 2024-08-08 | A speculative model for cyclic information preservation in Kerr-Newman spacetime using closed timelike curves | Aviral Damle et.al. | 2408.02116 | null |
| 2024-08-06 | Selection bias obfuscates the discovery of fast radio burst sources | Mohit Bhardwaj et.al. | 2408.01876 | null |
| 2024-08-03 | Dissolution zone model of the oxide structure in additively manufactured dispersion-strengthened alloys | Wenyuan Hou et.al. | 2408.01845 | null |
| 2024-08-02 | AT2023vto: An Exceptionally Luminous Helium Tidal Disruption Event from a Massive Star | Harsh Kumar et.al. | 2408.01482 | null |
| 2024-08-01 | Granting GPT-4 License and Opportunity: Enhancing Accuracy and Confidence Estimation for Few-Shot Event Detection | Steven Fincke et.al. | 2408.00914 | null |
| 2024-08-01 | Clover-2: Accurate Inference for Regressive Lightweight Speculative Decoding | Bin Xiao et.al. | 2408.00264 | null |
| 2024-07-31 | Designing Beyond Current Conceptualizations of Spaceflight Experiences | James Cole et.al. | 2408.00085 | null |
| 2024-07-31 | Revisiting the fundamental metallicity relation with observation and simulation | Chengyu Ma et.al. | 2407.21716 | null |
| 2024-07-31 | The Bulk Densities of Small Solar System Bodies as a Probe of Planetesimal Formation | Misako Tatsuuma et.al. | 2407.21386 | null |
| 2024-08-19 | Instantons and the Large N=4 Algebra | Edward Witten et.al. | 2407.20964 | null |
| 2024-07-17 | Steamroller Problems: An Evaluation of LLM Reasoning Capability with Automated Theorem Prover Strategies | Lachlan McGinness et.al. | 2407.20244 | null |
| 2024-08-19 | Reduced decay in Josephson coupling across ferromagnetic junctions with spin-orbit coupling layers | Ivan Kindiak et.al. | 2407.19799 | null |
| 2024-07-26 | Ionized and cold gas components in low surface brightness galaxy AGC 102004 | Tian-Wen Cao et.al. | 2407.18530 | null |
| 2024-07-25 | Phase transitions in (2 + 1)D subsystem-symmetric monitored quantum circuits | Cole Kelson-Packer et.al. | 2407.18340 | null |
| 2024-08-31 | Uniqueness of an $E_8$ model of elementary particles | Robert A. Wilson et.al. | 2407.18279 | null |
| 2024-07-24 | Automorphisms of Calabi-Yau threefolds from algebraic dynamics and the second Chern class | Keiji Oguiso et.al. | 2407.17297 | null |
| 2024-07-24 | Mapping the individual, social, and biospheric impacts of Foundation Models | Andrés Domínguez Hernández et.al. | 2407.17129 | null |
| 2024-07-04 | Integrated Deflector Shield Technology for Spacecraft | Florian Neukart et.al. | 2407.16701 | null |
| 2024-07-23 | Graph-Structured Speculative Decoding | Zhuocheng Gong et.al. | 2407.16207 | null |
| 2024-07-22 | AI for Handball: predicting and explaining the 2024 Olympic Games tournament with Deep Learning and Large Language Models | Florian Felice et.al. | 2407.15987 | null |
| 2024-07-22 | An Ad-hoc graph node vector embedding algorithm for general knowledge graphs using Kinetica-Graph | B. Kaan Karamete et.al. | 2407.15906 | null |
| 2024-07-23 | Unveiling the Multifaceted GRB 200613A: Prompt Emission Dynamics, Afterglow Evolution, and the Host Galaxy’s Properties | Shao-Yu Fu et.al. | 2407.15824 | null |
| 2024-11-21 | SNIP: Speculative Execution and Non-Interference Preservation for Compiler Transformations | Sören van der Wall et.al. | 2407.15080 | null |
| 2024-10-21 | Is the difference between deep hedging and delta hedging a statistical arbitrage? | Pascal François et.al. | 2407.14736 | link |
| 2024-07-19 | Rational Bubbles: A Clarification | Tomohiro Hirano et.al. | 2407.14017 | null |
| 2024-07-18 | Surface roughening in nanoparticle catalysts | Cameron J. Owen et.al. | 2407.13643 | null |
| 2024-07-18 | SecScale: A Scalable and Secure Trusted Execution Environment for Servers | Ani Sunny et.al. | 2407.13572 | null |
| 2024-07-17 | RTL Verification for Secure Speculation Using Contract Shadow Logic | Qinhan Tan et.al. | 2407.12232 | null |
| 2024-07-16 | Breakup dynamics of a neutron-halo projectile on heavy target at deep sub-barrier energies | B. Mukeru et.al. | 2407.12129 | null |
| 2024-11-16 | PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation | Branden Butler et.al. | 2407.11798 | null |
| 2024-10-02 | Optimized Multi-Token Joint Decoding with Auxiliary Model for LLM Inference | Zongyue Qin et.al. | 2407.09722 | null |
| 2024-07-17 | Accelerating the inference of string generation-based chemical reaction models for industrial applications | Mikhail Andronov et.al. | 2407.09685 | null |
| 2024-09-12 | Krylov complexity and chaos in deformed SYK models | Shira Chapman et.al. | 2407.09604 | null |
| 2024-07-21 | 6G: The Intelligent Network of Everything – A Comprehensive Vision, Survey, and Tutorial | Harri Pennanen et.al. | 2407.09398 | null |
| 2024-07-11 | Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting | Zilong Wang et.al. | 2407.08223 | null |
| 2024-07-10 | Purity benchmarking study of error coherence in a single Xmon qubit | Auda Zhu et.al. | 2407.07960 | null |
| 2024-07-10 | Carbon Pricing and Resale in Emission Trading Systems | Peyman Khezr et.al. | 2407.07386 | null |
| 2024-08-21 | Fuzzy Spheres in Stringy Matrix Models: Quantifying Chaos in a Mixed Phase Space | Paolo Amore et.al. | 2407.07259 | null |
| 2024-07-09 | Revolutionizing Battery Disassembly: The Design and Implementation of a Battery Disassembly Autonomous Mobile Manipulator Robot(BEAM-1) | Yanlong Peng et.al. | 2407.06590 | null |
| 2024-07-05 | Statistical investigations into the geometry and homology of random programs | Jon Sporring et.al. | 2407.04854 | null |
| 2024-07-05 | Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models | Bolaji Yusuf et.al. | 2407.04641 | null |
| 2024-11-13 | Black Holes with a charged quantum dust core | R. Casadio et.al. | 2407.04146 | null |
| 2024-08-23 | A distance conjecture beyond moduli? | Cédric Debusschere et.al. | 2407.03715 | null |
| 2024-07-03 | Braneworld Black Bounce to Transversable Wormhole Analytically Connected to an asymptotically $AdS_5$ Boundary | T. M. Crispim et.al. | 2407.03528 | null |
| 2024-07-03 | Origin of anomalous magnetotransport in kagome superconductors AV ${3}$Sb${5}$ (A=K,Rb,Cs) | A. E. Koshelev et.al. | 2407.03189 | null |
| 2024-09-24 | Large-scale ordered magnetic fields generated in mergers of helium white dwarfs | Rüdiger Pakmor et.al. | 2407.02566 | null |
| 2024-07-02 | A thermodynamic model of inflation without inflaton field | Jesus Anaya-Galeana et.al. | 2407.02429 | null |
| 2024-07-02 | MICONIC: JWST/MIRI MRS observations of the nuclear and circumnuclear regions of Mrk231 | A. Alonso-Herrero et.al. | 2407.02180 | null |
| 2024-07-02 | S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models | Parsa Kavehzadeh et.al. | 2407.01955 | null |
| 2024-08-31 | Description of molecular chirality and its analysis with high harmonic generation | Akihito Kato et.al. | 2407.01947 | null |
| 2024-07-01 | Universal properties of residual moments in heavy-fermion metals | Ewan Scott et.al. | 2407.01218 | null |
| 2024-07-01 | Staying vigilant in the Age of AI: From content generation to content authentication | Yufan Li et.al. | 2407.00922 | null |
| 2025-04-14 | Block Verification Accelerates Speculative Decoding | Ziteng Sun et.al. | 2403.10444 | null |
| 2024-03-06 | Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement | Wonseok Jeon et.al. | 2402.14160 | null |
| 2025-07-08 | Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding | Zhuoming Chen et.al. | 2402.12374 | null |
| 2025-02-06 | Decoding Speculative Decoding | Minghao Yan et.al. | 2402.01528 | null |
| 2024-04-10 | Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO | Haim Barad et.al. | 2311.04951 | null |
| 2023-08-10 | Accelerating LLM Inference with Staged Speculative Decoding | Benjamin Spector et.al. | 2308.04623 | null |
| 2023-05-22 | Fast Inference from Transformers via Speculative Decoding | Yaniv Leviathan et.al. | 2211.17192 | null |
| 2023-10-31 | Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation | Heming Xia et.al. | 2203.16487 | null |
Multimodal System
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2026-03-31 | GUIDE: Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play Annotation | Rui Xie et.al. | 2603.26266 | null |
| 2026-03-26 | DRiffusion: Draft-and-Refine Process Parallelizes Diffusion Models with Ease | Runsheng Bai et.al. | 2603.25872 | null |
| 2026-04-01 | DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving | Pengxuan Yang et.al. | 2603.24587 | null |
| 2026-04-01 | SCoOP: Semantic Consistent Opinion Pooling for Uncertainty Quantification in Multiple Vision-Language Model Systems | Chung-En Johnny Yu et.al. | 2603.23853 | null |
| 2026-03-19 | 6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models | Rundong Su et.al. | 2603.18742 | null |
| 2026-03-18 | DriveVLM-RL: Neuroscience-Inspired Reinforcement Learning with Vision-Language Models for Safe and Deployable Autonomous Driving | Zilin Huang et.al. | 2603.18315 | null |
| 2026-03-13 | Draft-and-Target Sampling for Video Generation Policy | Qikang Zhang et.al. | 2603.13438 | null |
| 2026-02-20 | Think First, Diffuse Fast: Improving Diffusion Language Model Reasoning via Autoregressive Plan Conditioning | Earl J St Sauver et.al. | 2603.13243 | null |
| 2026-03-11 | COHORT: Hybrid RL for Collaborative Large DNN Inference on Multi-Robot Systems Under Real-Time Constraints | Mohammad Saeid Anwar et.al. | 2603.10436 | null |
| 2026-03-09 | SoundWeaver: Semantic Warm-Starting for Text-to-Audio Diffusion Serving | Ayush Barik et.al. | 2603.07865 | null |
| 2026-03-08 | MWM: Mobile World Models for Action-Conditioned Consistent Prediction | Han Yan et.al. | 2603.07799 | null |
| 2026-02-27 | SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching | Yasaman Haghighi et.al. | 2602.24208 | null |
| 2026-02-26 | LE-NeuS: Latency-Efficient Neuro-Symbolic Video Understanding via Adaptive Temporal Verification | Shawn Liang et.al. | 2602.23553 | null |
| 2026-02-17 | Sparrow: Text-Anchored Window Attention with Visual-Semantic Glimpsing for Speculative Decoding in Video LLMs | Libo Zhang et.al. | 2602.15318 | null |
| 2026-02-13 | AdaCorrection: Adaptive Offset Cache Correction for Accurate Diffusion Transformers | Dong Liu et.al. | 2602.13357 | null |
| 2026-02-11 | FastUSP: A Multi-Level Collaborative Acceleration Framework for Distributed Diffusion Model Inference | Guandong Li et.al. | 2602.10940 | null |
| 2026-02-24 | Mapping Gemma3 onto an Edge Dataflow Architecture | Shouyu Du et.al. | 2602.06063 | null |
| 2026-02-04 | Annotation Free Spacecraft Detection and Segmentation using Vision Language Models | Samet Hicsonmez et.al. | 2602.04699 | null |
| 2026-02-05 | PIO-FVLM: Rethinking Training-Free Visual Token Reduction for VLM Acceleration from an Inference-Objective Perspective | Haokui Zhang et.al. | 2602.04657 | null |
| 2026-02-03 | ScDiVa: Masked Discrete Diffusion for Joint Modeling of Single-Cell Identity and Expression | Mingxuan Wang et.al. | 2602.03477 | null |
| 2026-02-03 | SwiftVLM: Efficient Vision-Language Model Inference via Cross-Layer Token Bypass | Chen Qian et.al. | 2602.03134 | null |
| 2026-01-31 | APEX: A Decoupled Memory-based Explorer for Asynchronous Aerial Object Goal Navigation | Daoxuan Zhang et.al. | 2602.00551 | null |
| 2026-01-20 | Likelihood-Separable Diffusion Inference for Multi-Image MRI Super-Resolution | Samuel W. Remedios et.al. | 2601.14030 | null |
| 2026-01-19 | AirHunt: Bridging VLM Semantics and Continuous Planning for Efficient Aerial Object Navigation | Xuecheng Chen et.al. | 2601.12742 | null |
| 2026-01-26 | ViSIL: Unified Evaluation of Information Loss in Multimodal Video Captioning | Po-han Li et.al. | 2601.09851 | null |
| 2025-12-30 | Bridging the Perception-Cognition Gap:Re-engineering SAM2 with Hilbert-Mamba for Robust VLM-based Medical Diagnosis | Hao Wu et.al. | 2512.24013 | null |
| 2025-12-29 | Iterative Inference-time Scaling with Adaptive Frequency Steering for Image Super-Resolution | Hexin Zhang et.al. | 2512.23532 | null |
| 2025-12-23 | Input-Adaptive Visual Preprocessing for Efficient Fast Vision-Language Model Inference | Putu Indah Githa Cahyani et.al. | 2512.20839 | null |
| 2025-12-21 | AsyncDiff: Asynchronous Timestep Conditioning for Enhanced Text-to-Image Diffusion Inference | Longhuan Xu et.al. | 2512.18675 | null |
| 2025-12-18 | Collaborative Edge-to-Server Inference for Vision-Language Models | Soochang Song et.al. | 2512.16349 | null |
| 2025-12-16 | Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models | Chiyue Wei et.al. | 2512.14661 | null |
| 2025-12-10 | LISN: Language-Instructed Social Navigation with VLM-based Controller Modulating | Junting Chen et.al. | 2512.09920 | null |
| 2025-12-05 | Training-Time Action Conditioning for Efficient Real-Time Chunking | Kevin Black et.al. | 2512.05964 | null |
| 2025-12-05 | Quantitatively mapping the Eady model onto a two-layer quasi-geostrophic model | Julie Meunier et.al. | 2512.05902 | null |
| 2025-12-05 | Non-equilibrium formulation for inertial particles in turbulent swirling flows | Bernardo L. Español et.al. | 2512.05855 | null |
| 2025-12-05 | HQ-DM: Single Hadamard Transformation-Based Quantization-Aware Training for Low-Bit Diffusion Models | Shizhuo Mao et.al. | 2512.05746 | null |
| 2025-12-05 | ProPhy: Progressive Physical Alignment for Dynamic World Simulation | Zijun Wang et.al. | 2512.05564 | null |
| 2025-12-05 | Conscious Gaze: Adaptive Attention Mechanisms for Hallucination Mitigation in Vision-Language Models | Weijue Bu et.al. | 2512.05546 | null |
| 2025-12-04 | Uncertainty Quantification for Scientific Machine Learning using Sparse Variational Gaussian Process Kolmogorov-Arnold Networks (SVGP KAN) | Y. Sungtaek Ju et.al. | 2512.05306 | null |
| 2025-12-04 | CFO: Learning Continuous-Time PDE Dynamics via Flow-Matched Neural Operators | Xianglong Hou et.al. | 2512.05297 | null |
| 2025-12-04 | XR-DT: Extended Reality-Enhanced Digital Twin for Agentic Mobile Robots | Tianyi Wang et.al. | 2512.05270 | null |
| 2025-12-04 | NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation | Yu Zeng et.al. | 2512.05106 | null |
| 2025-12-04 | TV2TV: A Unified Framework for Interleaved Language and Video Generation | Xiaochuang Han et.al. | 2512.05103 | null |
| 2025-12-04 | Hybrid-Diffusion Models: Combining Open-loop Routines with Visuomotor Diffusion Policies | Jonne Van Haastregt et.al. | 2512.04960 | null |
| 2025-12-04 | FASTer: Toward Efficient Autoregressive Vision Language Action Modeling via neural Action Tokenization | Yicheng Liu et.al. | 2512.04952 | null |
| 2025-12-04 | YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance | Junjie Zheng et.al. | 2512.04779 | null |
| 2025-12-04 | MemLoRA: Distilling Expert Adapters for On-Device Memory Systems | Massimo Bini et.al. | 2512.04763 | null |
| 2025-12-04 | E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous Driving | Yihong Tang et.al. | 2512.04733 | null |
| 2025-12-04 | Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild | Yigui Feng et.al. | 2512.04728 | null |
| 2025-12-05 | Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length | Yubo Huang et.al. | 2512.04677 | null |
| 2025-12-04 | Persson’s Theory of Purely Normal Elastic Rough Surface Contact: A Tutorial Based on Stochastic Process Theory | Yang Xu et.al. | 2512.04648 | null |
| 2025-12-04 | VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory | Yifei Yu et.al. | 2512.04519 | null |
| 2025-12-04 | GuidNoise: Single-Pair Guided Diffusion for Generalized Noise Synthesis | Changjin Kim et.al. | 2512.04456 | null |
| 2025-12-04 | NORi: An ML-Augmented Ocean Boundary Layer Parameterization | Xin Kai Lee et.al. | 2512.04452 | null |
| 2025-12-04 | FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination | Chengyang He et.al. | 2512.04381 | null |
| 2025-12-03 | Decoding Large Language Diffusion Models with Foreseeing Movement | Yichuan Mo et.al. | 2512.04135 | null |
| 2025-12-03 | DirectDrag: High-Fidelity, Mask-Free, Prompt-Free Drag-based Image Editing via Readout-Guided Feature Alignment | Sheng-Hao Liao et.al. | 2512.03981 | null |
| 2025-12-03 | Refining Machine Learning Potentials through Thermodynamic Theory of Phase Transitions | Paul Fuchs et.al. | 2512.03974 | null |
| 2025-12-03 | Training for Identity, Inference for Controllability: A Unified Approach to Tuning-Free Face Personalization | Lianyu Pang et.al. | 2512.03964 | null |
| 2025-12-03 | OmniDexVLG: Learning Dexterous Grasp Generation from Vision Language Model-Guided Grasp Semantics, Taxonomy and Functional Affordance | Lei Zhang et.al. | 2512.03874 | null |
| 2025-12-03 | Fully Unsupervised Self-debiasing of Text-to-Image Diffusion Models | Korada Sri Vardhana et.al. | 2512.03749 | null |
| 2025-12-03 | PosA-VLA: Enhancing Action Generation via Pose-Conditioned Anchor Attention | Ziwen Li et.al. | 2512.03724 | null |
| 2025-12-03 | GaussianBlender: Instant Stylization of 3D Gaussians with Disentangled Latent Spaces | Melis Ocal et.al. | 2512.03683 | null |
| 2025-12-03 | ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers | Feice Huang et.al. | 2512.03673 | null |
| 2025-12-03 | V-ITI: Mitigating Hallucinations in Multimodal Large Language Models via Visual Inference-Time Intervention | Nan Sun et.al. | 2512.03542 | null |
| 2025-12-03 | CSMapping: Scalable Crowdsourced Semantic Mapping and Topology Inference for Autonomous Driving | Zhijian Qiao et.al. | 2512.03510 | null |
| 2025-12-03 | KeyPointDiffuser: Unsupervised 3D Keypoint Learning via Latent Diffusion Models | Rhys Newbury et.al. | 2512.03450 | null |
| 2025-12-03 | MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification | Yujian Zhao et.al. | 2512.03404 | null |
| 2025-12-03 | Push-broom Mapping of Galaxies and Supernova Remnants with the SPRITE CubeSat | Elena Carlson et.al. | 2512.03329 | null |
| 2025-12-02 | Unlocking hidden biomolecular conformational landscapes in diffusion models at inference time | Daniel D. Richman et.al. | 2512.03312 | null |
| 2025-12-02 | Video2Act: A Dual-System Video Diffusion Policy with Robotic Spatio-Motional Modeling | Yueru Jia et.al. | 2512.03044 | null |
| 2025-12-03 | LoVoRA: Text-guided and Mask-free Video Object Removal and Addition with Learnable Object-aware Localization | Zhihan Xiao et.al. | 2512.02933 | null |
| 2025-12-02 | AutoNeural: Co-Designing Vision-Language Models for NPU Inference | Wei Chen et.al. | 2512.02924 | null |
| 2025-12-02 | Glance: Accelerating Diffusion Models with 1 Sample | Zhuobai Dong et.al. | 2512.02899 | null |
| 2025-12-03 | SwarmDiffusion: End-To-End Traversability-Guided Diffusion for Embodiment-Agnostic Navigation of Heterogeneous Robots | Iana Zhura et.al. | 2512.02851 | null |
| 2025-12-02 | Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach | Siyuan Yang et.al. | 2512.02834 | null |
| 2025-12-02 | Reasoning-Aware Multimodal Fusion for Hateful Video Detection | Shuonan Yang et.al. | 2512.02743 | null |
| 2025-12-02 | VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm | Zhenkai Wu et.al. | 2512.02700 | null |
| 2025-12-02 | PGP-DiffSR: Phase-Guided Progressive Pruning for Efficient Diffusion-based Image Super-Resolution | Zhongbao Yang et.al. | 2512.02681 | null |
| 2025-12-02 | Spatially-Grounded Document Retrieval via Patch-to-Region Relevance Propagation | Agathoklis Georgiou et.al. | 2512.02660 | null |
| 2025-12-02 | Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training | Hong-Jie You et.al. | 2512.02652 | null |
| 2025-12-02 | YingVideo-MV: Music-Driven Multi-Stage Video Generation | Jiahui Chen et.al. | 2512.02492 | null |
| 2025-12-02 | Boosting Medical Vision-Language Pretraining via Momentum Self-Distillation under Limited Computing Resources | Phuc Pham et.al. | 2512.02438 | null |
| 2025-12-02 | VACoT: Rethinking Visual Data Augmentation with VLMs | Zhengzhuo Xu et.al. | 2512.02361 | null |
| 2025-12-02 | Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science Perspective | Qiyao Xue et.al. | 2512.02340 | null |
| 2025-12-01 | ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation | Chenyang Gu et.al. | 2512.02013 | null |
| 2025-12-01 | Med-VCD: Mitigating Hallucination for Medical Large Vision Language Models through Visual Contrastive Decoding | Zahra Mahdavi et.al. | 2512.01922 | null |
| 2025-12-01 | Deconstructing Generative Diversity: An Information Bottleneck Analysis of Discrete Latent Generative Models | Yudi Wu et.al. | 2512.01831 | null |
| 2025-12-01 | CauSight: Learning to Supersense for Visual Causal Discovery | Yize Zhang et.al. | 2512.01827 | null |
| 2025-12-01 | Weight Space Representation Learning with Neural Fields | Zhuoqian Yang et.al. | 2512.01759 | null |
| 2025-12-01 | DiG-Flow: Discrepancy-Guided Flow Matching for Robust VLA Models | Wanpeng Zhang et.al. | 2512.01715 | null |
| 2025-12-01 | DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models | Patrick Kwon et.al. | 2512.01686 | null |
| 2025-12-01 | GRASP: Guided Residual Adapters with Sample-wise Partitioning | Felix Nützel et.al. | 2512.01675 | null |
| 2025-12-01 | SPARK: Sim-ready Part-level Articulated Reconstruction with VLM Knowledge | Yumeng He et.al. | 2512.01629 | null |
| 2025-12-01 | Reconstructing Multi-Scale Physical Fields from Extremely Sparse Measurements with an Autoencoder-Diffusion Cascade | Letian Yi et.al. | 2512.01572 | null |
| 2025-12-01 | Hawkes process with a diffusion-driven baseline: long-run behavior, inference, statistical tests | Maya Sadeler Perrin et.al. | 2512.01447 | null |
| 2025-12-01 | Existence of two thresholds in a bistable equation with nonlocal competition | Matthieu Alfaro et.al. | 2512.01435 | null |
| 2025-12-01 | MDiff4STR: Mask Diffusion Model for Scene Text Recognition | Yongkun Du et.al. | 2512.01422 | null |
| 2025-12-01 | FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution | Seungho Choi et.al. | 2512.01390 | null |
| 2025-12-01 | Consistency Flow Model Achieves One-step Denoising Error Correction Codes | Haoyu Lei et.al. | 2512.01389 | null |
| 2025-12-01 | Qualitatively distinct mechanisms of noise-induced escape in diffusively coupled bistable elements | Hidemasa Ishii et.al. | 2512.01388 | null |
| 2025-12-01 | Beyond Loss Guidance: Using PDE Residuals as Spectral Attention in Diffusion Neural Operators | Medha Sawhney et.al. | 2512.01370 | null |
| 2025-12-01 | TokenPure: Watermark Removal through Tokenized Appearance and Structural Guidance | Pei Yang et.al. | 2512.01314 | null |
| 2025-12-01 | Inversions of stochastic processes from ergodic measures of Nonlinear SDEs | Hongyu Liu et.al. | 2512.01307 | null |
| 2025-11-30 | PIANO: Physics-informed Dual Neural Operator for Precipitation Nowcasting | Seokhyun Chin et.al. | 2512.01062 | null |
| 2025-11-29 | EDIT: Early Diffusion Inference Termination for dLLMs Based on Dynamics of Training Gradients | He-Yen Hsieh et.al. | 2512.00670 | null |
| 2025-11-28 | Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent | Jianzhe Lin et.al. | 2511.23436 | null |
| 2025-11-28 | LFM2 Technical Report | Alexander Amini et.al. | 2511.23404 | null |
| 2025-11-28 | SafeHumanoid: VLM-RAG-driven Control of Upper Body Impedance for Humanoid Robot | Yara Mahmoud et.al. | 2511.23300 | null |
| 2025-11-28 | Unlocking Multilingual Reasoning Capability of LLMs and LVLMs through Representation Engineering | Qiming Li et.al. | 2511.23231 | null |
| 2025-11-28 | Obstruction reasoning for robotic grasping | Runyu Jiao et.al. | 2511.23186 | null |
| 2025-11-28 | InstanceV: Instance-Level Video Generation | Yuheng Chen et.al. | 2511.23146 | null |
| 2025-11-28 | db-SP: Accelerating Sparse Attention for Visual Generative Models with Dual-Balanced Sequence Parallelism | Siqi Chen et.al. | 2511.23113 | null |
| 2025-11-28 | MindPower: Enabling Theory-of-Mind Reasoning in VLM-based Embodied Agents | Ruoxuan Zhang et.al. | 2511.23055 | null |
| 2025-11-28 | Time Extrapolation with Graph Convolutional Autoencoder and Tensor Train Decomposition | Yuanhong Chen et.al. | 2511.23037 | null |
| 2025-11-28 | Masked Diffusion for Generative Recommendation | Kulin Shah et.al. | 2511.23021 | null |
| 2025-11-28 | BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation | Zeyu Zhang et.al. | 2511.22973 | null |
| 2025-11-28 | Seeing before Observable: Potential Risk Reasoning in Autonomous Driving via Vision Language Models | Jiaxin Liu et.al. | 2511.22928 | null |
| 2025-11-27 | CAPE: Context-Aware Diffusion Policy Via Proximal Mode Expansion for Collision Avoidance | Rui Heng Yang et.al. | 2511.22773 | null |
| 2025-11-27 | Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer | Z-Image Team et.al. | 2511.22699 | null |
| 2025-11-27 | Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield | Dongyang Liu et.al. | 2511.22677 | null |
| 2025-11-27 | VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models | Silin Cheng et.al. | 2511.22664 | null |
| 2025-11-27 | Geometrically-Constrained Agent for Spatial Reasoning | Zeren Chen et.al. | 2511.22659 | null |
| 2025-11-27 | Beyond Success: Refining Elegant Robot Manipulation from Mixed-Quality Data via Just-in-Time Intervention | Yanbo Mao et.al. | 2511.22555 | null |
| 2025-11-27 | Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration | Mengyu Yang et.al. | 2511.22533 | null |
| 2025-11-27 | CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving | Zhaohui Wang et.al. | 2511.22532 | null |
| 2025-11-26 | Canvas-to-Image: Compositional Image Generation with Multimodal Controls | Yusuf Dalva et.al. | 2511.21691 | null |
| 2025-11-26 | Model-Based Policy Adaptation for Closed-Loop End-to-End Autonomous Driving | Haohong Lin et.al. | 2511.21584 | null |
| 2025-11-26 | Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy | Teng Hu et.al. | 2511.21579 | null |
| 2025-11-26 | IntAttention: A Fully Integer Attention Pipeline for Efficient Edge Inference | Wanli Zhong et.al. | 2511.21513 | null |
| 2025-11-26 | MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices | Shuai Zhang et.al. | 2511.21475 | null |
| 2025-11-26 | Odin: Oriented Dual-module Integration for Text-rich Network Representation Learning | Kaifeng Hong et.al. | 2511.21416 | null |
| 2025-11-26 | From Diffusion to One-Step Generation: A Comparative Study of Flow-Based Models with Application to Image Inpainting | Umang Agarwal et.al. | 2511.21215 | null |
| 2025-11-26 | Which Layer Causes Distribution Deviation? Entropy-Guided Adaptive Pruning for Diffusion and Flow Models | Changlin Li et.al. | 2511.21122 | null |
| 2025-11-26 | From Bits to Rounds: Parallel Decoding with Exploration for Diffusion Language Models | Hengyu Fu et.al. | 2511.21103 | null |
| 2025-11-26 | OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection | Chujie Wang et.al. | 2511.21064 | null |
| 2025-11-26 | GuardTrace-VL: Detecting Unsafe Multimodel Reasoning via Iterative Safety Supervision | Yuxiao Xiang et.al. | 2511.20994 | null |
| 2025-11-25 | Dynamic Test-Time Compute Scaling in Control Policy: Difficulty-Aware Stochastic Interpolant Policy | Inkook Chun et.al. | 2511.20906 | null |
| 2025-11-25 | Test-Time Alignment of Text-to-Image Diffusion Models via Null-Text Embedding Optimisation | Taehoon Kim et.al. | 2511.20889 | null |
| 2025-11-25 | Symbiotic Brain-Machine Drawing via Visual Brain-Computer Interfaces | Gao Wang et.al. | 2511.20835 | null |
| 2025-11-25 | Training-Free Diffusion Priors for Text-to-Image Generation via Optimization-based Visual Inversion | Samuele Dell’Erba et.al. | 2511.20821 | null |
| 2025-11-25 | Text-Guided Semantic Image Encoder | Raghuveer Thirukovalluru et.al. | 2511.20770 | null |
| 2025-11-25 | Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout | Hidir Yesiltepe et.al. | 2511.20649 | null |
| 2025-11-25 | LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight | Yunze Man et.al. | 2511.20648 | null |
| 2025-11-25 | Image2Gcode: Image-to-G-code Generation for Additive Manufacturing Using Diffusion-Transformer Model | Ziyue Wang et.al. | 2511.20636 | null |
| 2025-11-25 | MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models | Chieh-Yun Chen et.al. | 2511.20629 | null |
| 2025-11-25 | Latent Diffusion Inversion Requires Understanding the Latent Space | Mingxing Rao et.al. | 2511.20592 | null |
| 2025-11-25 | Anatomica: Localized Control over Geometric and Topological Properties for Anatomical Diffusion Models | Karim Kadry et.al. | 2511.20587 | null |
| 2025-11-25 | Beyond Generation: Multi-Hop Reasoning for Factual Accuracy in Vision-Language Models | Shamima Hossain et.al. | 2511.20531 | null |
| 2025-11-25 | Efficient and Fast Generative-Based Singing Voice Separation using a Latent Diffusion Model | Genís Plaja-Roglans et.al. | 2511.20470 | null |
| 2025-11-25 | Object-Centric Vision Token Pruning for Vision Language Models | Guangyuan Li et.al. | 2511.20439 | null |
| 2025-11-25 | Image-Free Timestep Distillation via Continuous-Time Consistency with Trajectory-Sampled Pairs | Bao Tang et.al. | 2511.20410 | null |
| 2025-11-25 | FREE: Uncertainty-Aware Autoregression for Parallel Diffusion Transformers | Xinwan Wen et.al. | 2511.20390 | null |
| 2025-11-25 | Modified Equations for Stochastic Optimization | Stefan Perko et.al. | 2511.20322 | null |
| 2025-11-25 | TReFT: Taming Rectified Flow Models For One-Step Image Translation | Shengqian Li et.al. | 2511.20307 | null |
| 2025-11-25 | HVAdam: A Full-Dimension Adaptive Optimizer | Yiheng Zhang et.al. | 2511.20277 | null |
| 2025-11-25 | Rectified Flow for Vision-Aided mmWave V2I Beam Prediction | Can Zheng et.al. | 2511.20265 | null |
| 2025-11-25 | In-Context Compositional Learning via Sparse Coding Transformer | Wei Chen et.al. | 2511.20194 | null |
| 2025-11-25 | Spatially Resolved Plasma Diagnostics of the Supernova Remnant DEM L71 using the Reflection Grating Spectrometer | Yuki Amano et.al. | 2511.20112 | null |
| 2025-11-25 | iRadioDiff: Physics-Informed Diffusion Model for Indoor Radio Map Construction and Localization | Xiucheng Wang et.al. | 2511.20015 | null |
| 2025-11-25 | CounterVQA: Evaluating and Improving Counterfactual Reasoning in Vision-Language Models for Video Understanding | Yuefei Chen et.al. | 2511.19923 | null |
| 2025-11-25 | Scale Where It Matters: Training-Free Localized Scaling for Diffusion Models | Qin Ren et.al. | 2511.19917 | null |
| 2025-11-24 | Mixture of Horizons in Action Chunking | Dong Jing et.al. | 2511.19433 | null |
| 2025-11-24 | Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens | Yiming Qin et.al. | 2511.19418 | null |
| 2025-11-24 | Predicting partially observable dynamical systems via diffusion models with a multiscale inference scheme | Rudy Morel et.al. | 2511.19390 | null |
| 2025-11-24 | Efficiency vs. Fidelity: A Comparative Analysis of Diffusion Probabilistic Models and Flow Matching on Low-Resource Hardware | Srishti Gupta et.al. | 2511.19379 | null |
| 2025-11-24 | DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation | Zehong Ma et.al. | 2511.19365 | null |
| 2025-11-24 | Rethinking Intermediate Representation for VLM-based Robot Manipulation | Weiliang Tang et.al. | 2511.19315 | null |
| 2025-11-24 | CDLM: Consistency Diffusion Language Models For Faster Sampling | Minseo Kim et.al. | 2511.19269 | null |
| 2025-11-24 | SimDiff: Simpler Yet Better Diffusion Model for Time Series Point Forecasting | Hang Ding et.al. | 2511.19256 | null |
| 2025-11-24 | Learning Plug-and-play Memory for Guiding Video Diffusion Models | Selena Song et.al. | 2511.19229 | null |
| 2025-11-24 | EEG-VLM: A Hierarchical Vision-Language Model with Multi-Level Feature Alignment and Visually Enhanced Language-Guided Reasoning for EEG Image-Based Sleep Stage Prediction | Xihe Qiu et.al. | 2511.19155 | null |
| 2025-11-24 | MonoSR: Open-Vocabulary Spatial Reasoning from Monocular Images | Qirui Wang et.al. | 2511.19119 | null |
| 2025-11-24 | A Self-Conditioned Representation Guided Diffusion Model for Realistic Text-to-LiDAR Scene Generation | Wentao Qu et.al. | 2511.19004 | null |
| 2025-11-24 | BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models | Juncheng Li et.al. | 2511.18921 | null |
| 2025-11-24 | EventSTU: Event-Guided Efficient Spatio-Temporal Understanding for Video Large Language Models | Wenhao Xu et.al. | 2511.18920 | null |
| 2025-11-24 | MatMart: Material Reconstruction of 3D Objects via Diffusion | Xiuchao Wu et.al. | 2511.18900 | null |
| 2025-11-24 | Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference | Wengyi Zhan et.al. | 2511.18875 | null |
| 2025-11-24 | UNeMo: Collaborative Visual-Language Reasoning and Navigation via a Multimodal World Model | Changxin Huang et.al. | 2511.18845 | null |
| 2025-11-24 | DiP: Taming Diffusion Models in Pixel Space | Zhennan Chen et.al. | 2511.18822 | null |
| 2025-11-24 | Mitigating Long-Tail Bias in HOI Detection via Adaptive Diversity Cache | Yuqiu Jiang et.al. | 2511.18811 | null |
| 2025-11-24 | MergeVLA: Cross-Skill Model Merging Toward a Generalist Vision-Language-Action Agent | Yuxia Fu et.al. | 2511.18810 | null |
| 2025-11-21 | SPEAR-1: Scaling Beyond Robot Demonstrations via 3D Understanding | Nikolay Nikolov et.al. | 2511.17411 | null |
| 2025-11-21 | SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion | Jiajie Guo et.al. | 2511.17308 | null |
| 2025-11-21 | A Little More Like This: Text-to-Image Retrieval with Vision-Language Models Using Relevance Feedback | Bulat Khaertdinov et.al. | 2511.17255 | null |
| 2025-11-21 | FlexiFlow: decomposable flow matching for generation of flexible molecular ensemble | Riccardo Tedoldi et.al. | 2511.17249 | null |
| 2025-11-21 | FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle | Mario Markov et.al. | 2511.17171 | null |
| 2025-11-21 | One-Step Diffusion Transformer for Controllable Real-World Image Super-Resolution | Yushun Fang et.al. | 2511.17138 | null |
| 2025-11-21 | Sparse Reasoning is Enough: Biological-Inspired Framework for Video Anomaly Detection with Large Pre-trained Models | He Huang et.al. | 2511.17094 | null |
| 2025-11-21 | Diversity Has Always Been There in Your Visual Autoregressive Models | Tong Wang et.al. | 2511.17074 | null |
| 2025-11-21 | DAPS++: Rethinking Diffusion Inverse Problems with Decoupled Posterior Annealing | Hao Chen et.al. | 2511.17038 | null |
| 2025-11-21 | Energy Scaling Laws for Diffusion Models: Quantifying Compute and Carbon Emissions in Image Generation | Aniketh Iyengar et.al. | 2511.17031 | null |
| 2025-11-21 | VLM-Augmented Degradation Modeling for Image Restoration Under Adverse Weather Conditions | Qianyi Shao et.al. | 2511.16998 | null |
| 2025-11-21 | MultiPriv: Benchmarking Individual-Level Privacy Reasoning in Vision-Language Models | Xiongtao Sun et.al. | 2511.16940 | null |
| 2025-11-21 | UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation | Chi Zhang et.al. | 2511.16917 | null |
| 2025-11-21 | Align & Invert: Solving Inverse Problems with Diffusion and Flow-based Models via Representational Alignment | Loukas Sfountouris et.al. | 2511.16870 | null |
| 2025-11-20 | Towards Unified Vision Language Models for Forest Ecological Analysis in Earth Observation | Xizhe Xue et.al. | 2511.16853 | null |
| 2025-11-20 | TRIM: Scalable 3D Gaussian Diffusion Inference with Temporal and Spatial Trimming | Zeyuan Yin et.al. | 2511.16642 | null |
| 2025-11-21 | VLA-Pruner: Temporal-Aware Dual-Level Visual Token Pruning for Efficient Vision-Language-Action Inference | Ziyan Liu et.al. | 2511.16449 | null |
| 2025-11-20 | Decoupling Complexity from Scale in Latent Diffusion Model | Tianxiong Zhong et.al. | 2511.16117 | null |
| 2025-11-20 | T2T-VICL: Unlocking the Boundaries of Cross-Task Visual In-Context Learning via Implicit Text-Driven VLMs | Shao-Jun Xia et.al. | 2511.16107 | null |
| 2025-11-20 | Learning Tractable Distributions Of Language Model Continuations | Gwen Yidou-Weng et.al. | 2511.16054 | null |
| 2025-11-20 | Understanding and improving axial detection in optical tweezers based on the interference of forward- and backward- scattered light | Isaac Pérez Castillo et.al. | 2511.16036 | null |
| 2025-11-20 | Physics-Guided Inductive Spatiotemporal Kriging for PM2.5 with Satellite Gradient Constraints | Shuo Wang et.al. | 2511.16013 | null |
| 2025-11-19 | Breaking the Bottleneck with DiffuApriel: High-Throughput Diffusion LMs with Mamba Backbone | Vaibhav Singh et.al. | 2511.15927 | null |
| 2025-11-19 | Think Visually, Reason Textually: Vision-Language Synergy in ARC | Beichen Zhang et.al. | 2511.15703 | null |
| 2025-11-19 | MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping | Yushi Huang et.al. | 2511.15690 | null |
| 2025-11-19 | Theoretical Closed-loop Stability Bounds for Dynamical System Coupled with Diffusion Policies | Gabriel Lauzier et.al. | 2511.15520 | null |
| 2025-11-19 | What Your Features Reveal: Data-Efficient Black-Box Feature Inversion Attack for Split DNNs | Zhihan Ren et.al. | 2511.15316 | null |
| 2025-11-19 | Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning | Yuxuan Gu et.al. | 2511.15190 | null |
| 2025-11-19 | A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models | Duo Li et.al. | 2511.15098 | null |
| 2025-11-19 | Jointly Conditioned Diffusion Model for Multi-View Pose-Guided Person Image Synthesis | Chengyu Xie et.al. | 2511.15092 | null |
| 2025-11-19 | Reasoning via Video: The First Evaluation of Video Models’ Reasoning Abilities through Maze-Solving Tasks | Cheng Yang et.al. | 2511.15065 | null |
| 2025-11-19 | Aligning Generative Music AI with Human Preferences: Methods and Challenges | Dorien Herremans et.al. | 2511.15038 | null |
| 2025-11-18 | Measuring AI Progress in Drug Discovery: A Reproducible Leaderboard for the Tox21 Challenge | Antonia Ebner et.al. | 2511.14744 | null |
| 2025-11-18 | Oscillation Quenching Induced By Time-Varying Coupling Functions | Dushko Stavrov et.al. | 2511.14370 | null |
| 2025-11-18 | Bridging the Gap Between Bayesian Deep Learning and Ensemble Weather Forecasts | Xinlei Xiong et.al. | 2511.14218 | null |
| 2025-11-18 | InstantViR: Real-Time Video Inverse Problem Solver with Distilled Diffusion Prior | Weimin Bai et.al. | 2511.14208 | null |
| 2025-11-18 | Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion | Zhuo Li et.al. | 2511.14178 | null |
| 2025-11-18 | Run, Ruminate, and Regulate: A Dual-process Thinking System for Vision-and-Language Navigation | Yu Zhong et.al. | 2511.14131 | null |
| 2025-11-18 | Text-Driven Reasoning Video Editing via Reinforcement Learning on Digital Twin Representations | Yiqing Shen et.al. | 2511.14100 | null |
| 2025-11-18 | GRPO Privacy Is at Risk: A Membership Inference Attack Against Reinforcement Learning With Verifiable Rewards | Yule Liu et.al. | 2511.14045 | null |
| 2025-11-18 | Flood-LDM: Generalizable Latent Diffusion Models for rapid and accurate zero-shot High-Resolution Flood Mapping | Sun Han Neo et.al. | 2511.14033 | null |
| 2025-11-17 | Single Tensor Cell Segmentation using Scalar Field Representations | Kevin I. Ruiz Vargas et.al. | 2511.13947 | null |
| 2025-11-17 | Mapping the Cosmic-Ray Ionization Rate in the Local Galaxy with H $_3^+$ | Nick Indriolo et.al. | 2511.13915 | null |
| 2025-11-17 | Distribution Matching Distillation Meets Reinforcement Learning | Dengyang Jiang et.al. | 2511.13649 | null |
| 2025-11-17 | CacheFlow: Compressive Streaming Memory for Efficient Long-Form Video Understanding | Shrenik Patel et.al. | 2511.13644 | null |
| 2025-11-17 | Semantic Document Derendering: SVG Reconstruction via Vision-Language Modeling | Adam Hazimeh et.al. | 2511.13478 | null |
| 2025-11-18 | Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline | Rui Zuo et.al. | 2511.13442 | null |
| 2025-11-17 | Local asymptotic normality for discretely observed McKean-Vlasov diffusions | Akram Heidari et.al. | 2511.13366 | null |
| 2025-11-17 | TransFit-CSM: A Fast, Physically Consistent Framework for Interaction-Powered Transients | Yu-Hao Zhang et.al. | 2511.13265 | null |
| 2025-11-17 | GenTract: Generative Global Tractography | Alec Sargood et.al. | 2511.13183 | null |
| 2025-11-17 | Conditional Diffusion Model for Multi-Agent Dynamic Task Decomposition | Yanda Zhu et.al. | 2511.13137 | null |
| 2025-11-17 | MergeSlide: Continual Model Merging and Task-to-Class Prompt-Aligned Inference for Lifelong Learning on Whole Slide Images | Doanh C. Bui et.al. | 2511.13099 | null |
| 2025-11-17 | MeanFlow Transformers with Representation Autoencoders | Zheyuan Hu et.al. | 2511.13019 | null |
| 2025-11-17 | SAGE: Spuriousness-Aware Guided Prompt Exploration for Mitigating Multimodal Bias | Wenqian Ye et.al. | 2511.13005 | null |
| 2025-11-17 | Infinite-Story: A Training-Free Consistent Text-to-Image Generation | Jihun Park et.al. | 2511.13002 | null |
| 2025-11-17 | Recurrent Autoregressive Diffusion: Global Memory Meets Local Attention | Taiye Chen et.al. | 2511.12940 | null |
| 2025-11-17 | Yanyun-3: Enabling Cross-Platform Strategy Game Operation with Vision-Language Models | Guoyan Wang et.al. | 2511.12937 | null |
| 2025-11-17 | Method of Manufactured Learning for Solver-free Training of Neural Operators | Arth Sojitra et.al. | 2511.12890 | null |
| 2025-11-17 | BrainNormalizer: Anatomy-Informed Pseudo-Healthy Brain Reconstruction from Tumor MRI via Edge-Guided ControlNet | Min Gu Kwak et.al. | 2511.12853 | null |
| 2025-11-16 | Prompt-Driven Domain Adaptation for End-to-End Autonomous Driving via In-Context RL | Aleesha Khurram et.al. | 2511.12755 | null |
| 2025-11-16 | Backdoor Attacks on Open Vocabulary Object Detectors via Multi-Modal Prompt Tuning | Ankita Raj et.al. | 2511.12735 | null |
| 2025-11-16 | QPU Micro-Kernels for Stencil Computation | Stefano Markidis et.al. | 2511.12617 | null |
| 2025-11-16 | CoTBox-TTT: Grounding Medical VQA with Visual Chain-of-Thought Boxes During Test-time Training | Jiahe Qian et.al. | 2511.12446 | null |
| 2025-11-16 | RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning | Jingqi Xu et.al. | 2511.12428 | null |
| 2025-11-14 | PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision–Language Models | Nhat Hoang-Xuan et.al. | 2511.11502 | null |
| 2025-11-14 | Planetary nebulae as tracers of stellar population properties: a pilot study with MUSE | Ana Inés Ennis et.al. | 2511.11479 | null |
| 2025-11-14 | DiffPro: Joint Timestep and Layer-Wise Precision Optimization for Efficient Diffusion Inference | Farhana Amin et.al. | 2511.11446 | null |
| 2025-11-14 | BOFA: Bridge-Layer Orthogonal Low-Rank Fusion for CLIP-Based Class-Incremental Learning | Lan Li et.al. | 2511.11421 | null |
| 2025-11-14 | EcoAlign: An Economically Rational Framework for Efficient LVLM Alignment | Ruoxi Cheng et.al. | 2511.11301 | null |
| 2025-11-14 | GraphPilot: Grounded Scene Graph Conditioning for Language-Based Autonomous Driving | Fabian Schmidt et.al. | 2511.11266 | null |
| 2025-11-14 | CountSteer: Steering Attention for Object Counting in Diffusion Models | Hyemin Boo et.al. | 2511.11253 | null |
| 2025-11-14 | Viper-F1: Fast and Fine-Grained Multimodal Understanding with Cross-Modal State-Space Modulation | Quoc-Huy Trinh et.al. | 2511.11177 | null |
| 2025-11-14 | Explainable Deep Convolutional Multi-Type Anomaly Detection | Alex George et.al. | 2511.11165 | null |
| 2025-11-14 | Non-Gaussianity-induced enhanced target-finding dynamics of confined colloids | Guirec de Tournemire et.al. | 2511.11117 | null |
| 2025-11-14 | Sheaf Cohomology of Linear Predictive Coding Networks | Jeffrey Seely et.al. | 2511.11092 | null |
| 2025-11-14 | SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation | Sumin Yu et.al. | 2511.11014 | null |
| 2025-11-14 | VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models | Xinlei Yu et.al. | 2511.11007 | null |
| 2025-11-14 | CLUE: Controllable Latent space of Unprompted Embeddings for Diversity Management in Text-to-Image Synthesis | Keunwoo Park et.al. | 2511.10993 | null |
| 2025-11-14 | Binary Verification for Zero-Shot Vision | Jeffrey Liu et.al. | 2511.10983 | null |
| 2025-11-13 | FengHuang: Next-Generation Memory Orchestration for AI Inferencing | Jiamin Li et.al. | 2511.10753 | null |
| 2025-11-13 | Diffusion in the stochastic Klein-Gordon equation | Jonathan Oppenheim et.al. | 2511.10738 | null |
| 2025-11-13 | Reaching for the Edge II: Stellar Halos out to Large Radii as a Tracer of Dark Matter Halo Mass | Katya Leidig et.al. | 2511.10723 | null |
| 2025-11-14 | OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer | Haosong Peng et.al. | 2511.10560 | null |
| 2025-11-13 | A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space | Huijie Liu et.al. | 2511.10555 | null |
| 2025-11-13 | SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation | Wei Li et.al. | 2511.10518 | null |
| 2025-11-13 | Adaptive Residual-Update Steering for Low-Overhead Hallucination Mitigation in Large Vision Language Models | Zhengtao Zou et.al. | 2511.10292 | null |
| 2025-11-13 | PROPA: Toward Process-level Optimization in Visual Reasoning via Reinforcement Learning | Yanbei Jiang et.al. | 2511.10279 | null |
| 2025-11-13 | LiNeXt: Revisiting LiDAR Completion with Efficient Non-Diffusion Architectures | Wenzhe He et.al. | 2511.10209 | null |
| 2025-11-13 | AI-Integrated Decision Support System for Real-Time Market Growth Forecasting and Multi-Source Content Diffusion Analytics | Ziqing Yin et.al. | 2511.09962 | null |
| 2025-11-13 | Remember Me: Bridging the Long-Range Gap in LVLMs with Three-Step Inference-Only Decay Resilience Strategies | Peng Gao et.al. | 2511.09868 | null |
| 2025-11-12 | From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance | Jeongho Min et.al. | 2511.09820 | null |
| 2025-11-12 | Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models | Konstantinos M. Dafnis et.al. | 2511.09809 | null |
| 2025-11-12 | HeatGen: A Guided Diffusion Framework for Multiphysics Heat Sink Design Optimization | Hadi Keramati et.al. | 2511.09578 | null |
| 2025-11-12 | Controllable protein design through Feynman-Kac steering | Erik Hartman et.al. | 2511.09216 | null |
| 2025-11-12 | FSampler: Training Free Acceleration of Diffusion Sampling via Epsilon Extrapolation | Michael A. Vladimir et.al. | 2511.09180 | null |
| 2025-11-12 | Emission-Line and Continuum Reverberation Mapping of the NLS1 Galaxy WPVS 48 | M. A. Probst et.al. | 2511.09153 | null |
| 2025-11-12 | Diff-V2M: A Hierarchical Conditional Diffusion Model with Explicit Rhythmic Modeling for Video-to-Music Generation | Shulei Ji et.al. | 2511.09090 | null |
| 2025-11-12 | Diversifying Counterattacks: Orthogonal Exploration for Robust CLIP Inference | Chengze Jiang et.al. | 2511.09064 | null |
| 2025-11-12 | Expand Your SCOPE: Semantic Cognition over Potential-Based Exploration for Embodied Visual Navigation | Ningnan Wang et.al. | 2511.08935 | null |
| 2025-11-12 | From Structure to Detail: Hierarchical Distillation for Efficient Diffusion Model | Hanbo Cheng et.al. | 2511.08930 | null |
| 2025-11-12 | TiDAR: Think in Diffusion, Talk in Autoregression | Jingyu Liu et.al. | 2511.08923 | null |
| 2025-11-12 | Machines Serve Human: A Novel Variable Human-machine Collaborative Compression Framework | Zifu Zhang et.al. | 2511.08915 | null |
| 2025-11-04 | The Pervasive Blind Spot: Benchmarking VLM Inference Risks on Everyday Personal Videos | Shuning Zhang et.al. | 2511.02367 | null |
| 2025-10-26 | Encoder-Decoder Diffusion Language Models for Efficient Training and Inference | Marianne Arriola et.al. | 2510.22852 | null |
| 2025-10-26 | FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference | Divya Jyoti Bajpai et.al. | 2510.22641 | null |
| 2025-10-28 | Token-Level Inference-Time Alignment for Vision-Language Models | Kejia Chen et.al. | 2510.21794 | null |
| 2025-10-20 | SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference | Samir Khaki et.al. | 2510.17777 | null |
| 2025-10-22 | VERA-V: Variational Inference Framework for Jailbreaking Vision-Language Models | Qilin Liao et.al. | 2510.17759 | null |
| 2025-10-16 | Efficient Video Sampling: Pruning Temporally Redundant Tokens for Faster VLM Inference | Natan Bagrov et.al. | 2510.14624 | null |
| 2025-10-13 | Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation | Maggie Wang et.al. | 2510.11689 | null |
| 2025-10-13 | When Does Supervised Training Pay Off? The Hidden Economics of Object Detection in the Era of Vision-Language Models | Samer Al-Hamadani et.al. | 2510.11302 | null |
| 2025-10-11 | Efficient Navigation in Unknown Indoor Environments with Vision-Language Models | D. Schwartz et.al. | 2510.04991 | null |
| 2025-10-03 | TridentServe: A Stage-level Serving System for Diffusion Pipelines | Yifei Xia et.al. | 2510.02838 | null |
| 2025-10-26 | EVODiff: Entropy-aware Variance Optimized Diffusion Inference | Shigui Li et.al. | 2509.26096 | null |
| 2025-09-28 | Sequential Diffusion Language Models | Yangzhou Liu et.al. | 2509.24007 | null |
| 2025-09-28 | HiViS: Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models | Zhinan Xie et.al. | 2509.23928 | null |
| 2025-11-27 | Manifold-Aware Diffusion-Augmented Contrastive Learning for Noise-Robust Biosignal Representation | Rami Zewail et.al. | 2509.20048 | null |
| 2025-09-20 | Eye Gaze Tells You Where to Compute: Gaze-Driven Efficient VLMs | Qinyu Chen et.al. | 2509.16476 | null |
| 2025-09-21 | SpecVLM: Fast Speculative Decoding in Vision-Language Models | Haiduo Huang et.al. | 2509.11815 | null |
| 2025-09-15 | STADI: Fine-Grained Step-Patch Diffusion Parallelism for Heterogeneous GPUs | Han Liang et.al. | 2509.04719 | null |
| 2025-08-26 | MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs | Sixun Dong et.al. | 2508.18264 | null |
| 2025-08-20 | GM-Skip: Metric-Guided Transformer Block Skipping for Efficient Vision-Language Models | Lianming Huang et.al. | 2508.18227 | null |
| 2025-08-21 | Pretrained Diffusion Models Are Inherently Skipped-Step Samplers | Wenju Xu et.al. | 2508.15233 | null |
| 2025-08-11 | AdaptInfer: Adaptive Token Pruning for Vision-Language Model Inference with Dynamical Text Guidance | Weichen Zhang et.al. | 2508.06084 | null |
| 2025-08-07 | Real-Time Iteration Scheme for Diffusion Policy | Yufei Duan et.al. | 2508.05396 | null |
| 2025-07-23 | Accelerating Parallel Diffusion Model Serving with Residual Compression | Jiajun Luo et.al. | 2507.17511 | null |
| 2025-07-11 | BlindSight: Harnessing Sparsity for Efficient VLMs | Tharun Adithya Srikrishnan et.al. | 2507.09071 | null |
| 2025-09-30 | Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling? | Mingyuan Wu et.al. | 2506.17417 | null |
| 2025-06-20 | Consistent Sampling and Simulation: Molecular Dynamics with Energy-Based Diffusion Models | Michael Plainer et.al. | 2506.17139 | null |
| 2025-06-18 | VLMInferSlow: Evaluating the Efficiency Robustness of Large Vision-Language Models as a Service | Xiasi Wang et.al. | 2506.15755 | null |
| 2025-07-01 | Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model | Anirud Aggarwal et.al. | 2506.15682 | null |
| 2025-06-12 | Adding simple structure at inference improves Vision-Language Compositionality | Imanol Miranda et.al. | 2506.09691 | null |
| 2025-06-09 | Event-Priori-Based Vision-Language Model for Efficient Visual Understanding | Haotong Qin et.al. | 2506.07627 | null |
| 2025-09-03 | RNE: plug-and-play diffusion inference-time control and energy-based training | Jiajun He et.al. | 2506.05668 | null |
| 2025-10-10 | Can Vision Language Models Infer Human Gaze Direction? A Controlled Study | Zory Zhang et.al. | 2506.05412 | null |
| 2025-10-05 | Inference-time Scaling of Diffusion Models through Classical Search | Xiangcheng Zhang et.al. | 2505.23614 | null |
| 2025-05-27 | InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling | Xiaoxiao Jiang et.al. | 2505.20600 | null |
| 2025-05-25 | SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation | Shenggan Cheng et.al. | 2505.19151 | null |
| 2025-06-13 | VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis | Tina Khezresmaeilzadeh et.al. | 2505.18570 | null |
| 2025-05-23 | VERDI: VLM-Embedded Reasoning for Autonomous Driving | Bowen Feng et.al. | 2505.15925 | null |
| 2025-05-20 | Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism | Kunyun Wang et.al. | 2505.14741 | null |
| 2025-04-14 | Efficient Task-specific Conditional Diffusion Policies: Shortcut Model Acceleration and SO(3) Optimization | Haiyong Yu et.al. | 2504.09927 | null |
| 2025-04-15 | Metropolis-Hastings Captioning Game: Knowledge Fusion of Vision Language Models via Decentralized Bayesian Inference | Yuta Matsui et.al. | 2504.09620 | null |
| 2025-03-17 | VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers | Ruanjun Li et.al. | 2503.09387 | null |
| 2025-02-20 | Light communicative materials | Hongshuang Guo et.al. | 2503.05744 | null |
| 2025-02-21 | Evaluating Precise Geolocation Inference Capabilities of Vision Language Models | Neel Jay et.al. | 2502.14412 | null |
| 2025-10-08 | Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search | Yuta Oshima et.al. | 2501.19252 | null |
| 2025-02-10 | Membership Inference Attacks Against Vision-Language Models | Yuke Hu et.al. | 2501.18624 | null |
| 2025-03-10 | Probing the Quantum Nature of Gravity through Classical Diffusion | Oliviero Angeli et.al. | 2501.13030 | null |
| 2025-01-16 | PATCHEDSERVE: A Patch Management Framework for SLO-Optimized Hybrid Resolution Diffusion Serving | Desen Sun et.al. | 2501.09253 | null |
| 2025-01-16 | StructSR: Refuse Spurious Details in Real-World Image Super-Resolution | Yachao Li et.al. | 2501.05777 | link |
| 2024-12-19 | Unified Image Restoration and Enhancement: Degradation Calibrated Cycle Reconstruction Diffusion Model | Minglong Xue et.al. | 2412.14630 | link |
| 2025-06-30 | Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension | Xiyao Wang et.al. | 2412.03704 | link |
| 2024-12-05 | A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs | Wangbo Zhao et.al. | 2412.03324 | link |
| 2024-12-02 | [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster | Qizhe Zhang et.al. | 2412.01818 | link |
| 2025-03-30 | Staleness-Centric Optimizations for Parallel Diffusion MoE Inference | Jiajun Luo et.al. | 2411.16786 | null |
| 2024-11-01 | VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration | Dezhan Tu et.al. | 2410.23317 | null |
| 2025-01-07 | Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance | Dongmin Park et.al. | 2410.22376 | link |
| 2024-10-30 | Natural Language Inference Improves Compositionality in Vision-Language Models | Paola Cascante-Bonilla et.al. | 2410.22315 | null |
| 2024-10-18 | Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models | Jie Ren et.al. | 2410.13088 | null |
| 2025-02-11 | ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time | Yi Ding et.al. | 2410.06625 | null |
| 2024-10-08 | A scaling limit for additive functionals | Thibaud Taillefumier et.al. | 2410.06383 | null |
| 2024-09-03 | CT-SDM: A Sampling Diffusion Model for Sparse-View CT Reconstruction across All Sampling Rates | Liutao Yang et.al. | 2409.01571 | null |
| 2024-07-27 | Faster Image2Video Generation: A Closer Look at CLIP Image Embedding’s Impact on Spatio-Temporal Cross-Attentions | Ashkan Taghipour et.al. | 2407.19205 | null |
| 2024-07-15 | LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis | Zhenxiong Tan et.al. | 2407.10468 | link |
| 2024-06-13 | DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning | Xuemin Hu et.al. | 2406.09089 | null |
| 2024-10-03 | I4VGen: Image as Free Stepping Stone for Text-to-Video Generation | Xiefan Guo et.al. | 2406.02230 | null |
| 2025-01-14 | Amortizing intractable inference in diffusion models for vision, language, and control | Siddarth Venkatraman et.al. | 2405.20971 | null |
| 2024-05-30 | DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation | Zachary Novack et.al. | 2405.20289 | null |
| 2024-05-26 | Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference | Xunpeng Huang et.al. | 2405.16387 | null |
| 2025-04-16 | Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models | Katherine Xu et.al. | 2405.14828 | null |
| 2024-04-25 | Inferring solid-state diffusivity in lithium-ion battery active materials: improving upon the classical GITT method | A. Emir Gumrukcuoglu et.al. | 2404.16658 | null |
| 2024-11-05 | Private Attribute Inference from Images with Vision-Language Models | Batuhan Tömekçe et.al. | 2404.10618 | null |
| 2024-05-02 | Privacy-Preserving Diffusion Model Using Homomorphic Encryption | Yaojian Chen et.al. | 2403.05794 | link |
| 2024-05-08 | ToDo: Token Downsampling for Efficient Generation of High-Resolution Images | Ethan Smith et.al. | 2402.13573 | null |
| 2024-06-03 | DITTO: Diffusion Inference-Time T-Optimization for Music Generation | Zachary Novack et.al. | 2401.12179 | null |
| 2023-12-10 | Statistical Spatially Inhomogeneous Diffusion Inference | Yinuo Ren et.al. | 2312.05793 | null |
| 2023-07-31 | Cross-Modal Concept Learning and Inference for Vision-Language Models | Yi Zhang et.al. | 2307.15460 | null |
| 2024-01-04 | Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference | Zihao Yu et.al. | 2305.17423 | link |
| 2023-10-25 | ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval | Kexun Zhang et.al. | 2302.02285 | link |
| 2021-08-11 | Manifold-aware Synthesis of High-resolution Diffusion from Structural Imaging | Benoit Anctil-Robitaille et.al. | 2108.04135 | null |
| 2021-12-22 | Functional Data Analysis with Rough Sample Paths? | Neda Mohammadi et.al. | 2105.12035 | null |
| 2014-06-03 | $C^0$ -estimates and smoothness of solutions to the parabolic equation defined by Kimura operators | Camelia A. Pop et.al. | 1406.0742 | null |
| 2015-04-01 | On nonnegative unbiased estimators | Pierre E. Jacob et.al. | 1309.6473 | null |