Updated on 2025.06.28
Usage instructions: here
LLM inference
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-06-26 | Efficient and Reuseable Cloud Configuration Search Using Discovery Spaces | Michael Johnston et.al. | 2506.21467 | null |
2025-06-26 | BLOCKS: Blockchain-supported Cross-Silo Knowledge Sharing for Efficient LLM Services | Zhaojiacheng Zhou et.al. | 2506.21033 | null |
2025-06-17 | Utility-Driven Speculative Decoding for Mixture-of-Experts | Anish Saxena et.al. | 2506.20675 | null |
2025-06-25 | DipSVD: Dual-importance Protected SVD for Efficient LLM Compression | Xuan Ding et.al. | 2506.20353 | null |
2025-06-25 | Breaking the Boundaries of Long-Context LLM Inference: Adaptive KV Management on a Single Commodity GPU | He Sun et.al. | 2506.20187 | null |
2025-06-24 | MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection | Zhengxiang Huang et.al. | 2506.19884 | null |
2025-06-24 | Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models | Jungwoo Park et.al. | 2506.19697 | null |
2025-06-25 | Adaptive Request Scheduling for CodeLLM Serving with SLA Guarantees | Shi Chang et.al. | 2506.19677 | null |
2025-06-23 | Black-Box Test Code Fault Localization Driven by Large Language Models and Execution Estimation | Ahmadreza Saboor Yaraghi et.al. | 2506.19045 | null |
2025-06-23 | WiLLM: An Open Wireless LLM Communication System | Boyi Liu et.al. | 2506.19030 | null |
2025-06-23 | LLMs on a Budget? Say HOLA | Zohaib Hasan Siddiqui et.al. | 2506.18952 | null |
2025-06-23 | CommVQ: Commutative Vector Quantization for KV Cache Compression | Junyan Li et.al. | 2506.18879 | null |
2025-06-26 | PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries | Steven Kolawole et.al. | 2506.18728 | null |
2025-06-22 | Mechanistic Interpretability in the Presence of Architectural Obfuscation | Marcos Florencio et.al. | 2506.18053 | null |
2025-06-22 | LLMs for Customized Marketing Content Generation and Evaluation at Scale | Haoran Liu et.al. | 2506.17863 | null |
2025-06-21 | LLM-driven Medical Report Generation via Communication-efficient Heterogeneous Federated Learning | Haoxuan Che et.al. | 2506.17562 | null |
2025-06-08 | Training-free LLM Verification via Recycling Few-shot Examples | Dongseok Lee et.al. | 2506.17251 | null |
2025-06-20 | Towards AI Search Paradigm | Yuchen Li et.al. | 2506.17188 | null |
2025-06-23 | From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents | Mohammad Amaan Sayeed et.al. | 2506.15911 | null |
2025-05-30 | Learn from the Past: Fast Sparse Indexing for Large Language Model Decoding | Feiyu Yao et.al. | 2506.15704 | null |
2025-06-18 | eLLM: Elastic Memory Management Framework for Efficient LLM Serving | Jiale Xu et.al. | 2506.15155 | null |
2025-06-17 | CrEst: Credibility Estimation for Contexts in LLMs via Weak Supervision | Dyah Adila et.al. | 2506.14912 | null |
2025-06-17 | Cost-Efficient Serving of LLM Agents via Test-Time Plan Caching | Qizheng Zhang et.al. | 2506.14852 | null |
2025-06-05 | MobiEdit: Resource-efficient Knowledge Editing for Personalized On-device LLMs | Zhenyan Lu et.al. | 2506.13772 | null |
2025-06-17 | Prefix-Tuning+: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention | Haonan Wang et.al. | 2506.13674 | null |
2025-06-16 | Vector Ontologies as an LLM world view extraction method | Kaspar Rothenfusser et.al. | 2506.13252 | link |
2025-06-16 | Empirical Evaluation of Large Language Models in Automated Program Repair | Jiajun Sun et.al. | 2506.13186 | null |
2025-06-19 | Serving Large Language Models on Huawei CloudMatrix384 | Pengfei Zuo et.al. | 2506.12708 | null |
2025-06-13 | Semantic Scheduling for LLM Inference | Wenyue Hua et.al. | 2506.12204 | link |
2025-05-21 | FlexQuant: A Flexible and Efficient Dynamic Precision Switching Framework for LLM Quantization | Fangxin Liu et.al. | 2506.12024 | null |
2025-06-13 | Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache | Xiaoran Liu et.al. | 2506.11886 | null |
2025-06-13 | GraphRAG-Causal: A novel graph-augmented framework for causal reasoning and annotation in news | Abdul Haque et.al. | 2506.11600 | null |
2025-06-13 | Collaborative LLM Inference via Planning for Efficient Reasoning | Byeongchan Lee et.al. | 2506.11578 | null |
2025-06-13 | Efficient Long-Context LLM Inference via KV Cache Clustering | Jie Hu et.al. | 2506.11418 | null |
2025-06-12 | From Replication to Redesign: Exploring Pairwise Comparisons for LLM-Based Peer Review | Yaohui Zhang et.al. | 2506.11343 | null |
2025-06-12 | SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding | Ziyi Zhang et.al. | 2506.11309 | null |
2025-06-06 | DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration | Hanzhi Zhang et.al. | 2506.11104 | link |
2025-06-12 | Slimming Down LLMs Without Losing Their Minds | Qingda et.al. | 2506.10885 | null |
2025-06-12 | AdaptiveLLM: A Framework for Selecting Optimal Cost-Efficient LLM for Code-Generation Based on CoT Length | Junhang Cheng et.al. | 2506.10525 | link |
2025-06-12 | TD-Pipe: Temporally-Disaggregated Pipeline Parallelism Architecture for High-Throughput LLM Inference | Hongbin Zhang et.al. | 2506.10470 | null |
2025-06-11 | A First Look at Bugs in LLM Inference Engines | Mugeng Liu et.al. | 2506.09713 | link |
2025-06-12 | Understanding the Performance and Power of LLM Inferencing on Edge Accelerators | Mayank Arya et.al. | 2506.09554 | null |
2025-06-11 | Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning | Jiayi Yuan et.al. | 2506.09501 | null |
2025-06-10 | Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive- $k$ | Chihiro Taguchi et.al. | 2506.08479 | null |
2025-06-10 | Draft-based Approximate Inference for LLMs | Kevin Galim et.al. | 2506.08373 | link |
2025-06-09 | MiniCPM4: Ultra-Efficient LLMs on End Devices | MiniCPM Team et.al. | 2506.07900 | link |
2025-06-09 | How Benchmark Prediction from Fewer Data Misses the Mark | Guanhua Zhang et.al. | 2506.07673 | link |
2025-06-09 | TreeReview: A Dynamic Tree of Questions Framework for Deep and Efficient LLM-based Scientific Peer Review | Yuan Chang et.al. | 2506.07642 | null |
2025-06-09 | MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts | Wei Tao et.al. | 2506.07533 | null |
2025-06-07 | Containerized In-Storage Processing and Computing-Enabled SSD Disaggregation | Miryeong Kwon et.al. | 2506.06769 | null |
2025-06-06 | Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques | Adarsh Prasad Behera et.al. | 2506.06579 | null |
2025-06-06 | Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage | Ziqi Yuan et.al. | 2506.06472 | null |
2025-06-04 | On the Fundamental Impossibility of Hallucination Control in Large Language Models | Michał P. Karpowicz et.al. | 2506.06382 | null |
2025-05-21 | Reward Is Enough: LLMs Are In-Context Reinforcement Learners | Kefan Song et.al. | 2506.06303 | null |
2025-06-06 | AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search | Yu Li et.al. | 2506.06017 | null |
2025-06-06 | FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model | Md Jueal Mia et.al. | 2506.05640 | link |
2025-06-11 | Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models | Yanzhao Zhang et.al. | 2506.05176 | null |
2025-06-05 | Are LLMs Reliable Translators of Logical Reasoning Across Lexically Diversified Contexts? | Qingchuan Li et.al. | 2506.04575 | link |
2025-06-04 | Cascadia: A Cascade Serving System for Large Language Models | Youhe Jiang et.al. | 2506.04203 | null |
2025-06-04 | SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling | Anhao Zhao et.al. | 2506.04179 | null |
2025-06-04 | GORACS: Group-level Optimal Transport-guided Coreset Selection for LLM-based Recommender Systems | Tiehua Mei et.al. | 2506.04015 | null |
2025-06-04 | Pre $^3$ : Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation | Junyi Chen et.al. | 2506.03887 | null |
2025-06-04 | Client-Side Zero-Shot LLM Inference for Comprehensive In-Browser URL Analysis | Avihay Cohen et.al. | 2506.03656 | null |
2025-06-04 | POSS: Position Specialist Generates Better Draft for Speculative Decoding | Langlin Huang et.al. | 2506.03566 | link |
2025-06-07 | Parallel CPU-GPU Execution for LLM Inference on Constrained GPUs | Jiakun Fan et.al. | 2506.03296 | null |
2025-06-03 | QKV Projections Require a Fraction of Their Memory | Malik Khalaf et.al. | 2506.02939 | null |
2025-06-03 | Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs | Shangmin Guo et.al. | 2506.02918 | null |
2025-06-14 | TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression | Zhong-Zhi Li et.al. | 2506.02678 | link |
2025-06-19 | KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider | Jiahao Wang et.al. | 2506.02634 | link |
2025-06-03 | HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference | Ping Gong et.al. | 2506.02572 | link |
2025-06-03 | Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective | Shenghua He et.al. | 2506.02553 | null |
2025-05-29 | NestedFP: High-Performance, Memory-Efficient Dual-Precision Floating Point Support for LLMs | Haeun Lee et.al. | 2506.02024 | null |
2025-05-24 | Efficient and Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing | Zhaoyuan Su et.al. | 2506.02006 | null |
2025-05-16 | Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism | Yuhao Shen et.al. | 2506.01979 | null |
2025-06-02 | Memory Access Characterization of Large Language Models in CPU Environment and its Potential Impacts | Spencer Banasik et.al. | 2506.01827 | null |
2025-05-13 | AI Accelerators for Large Language Model Inference: Architecture Analysis and Scaling Strategies | Amit Sharma et.al. | 2506.00008 | null |
2025-05-30 | AFLoRA: Adaptive Federated Fine-Tuning of Large Language Models with Resource-Aware Low-Rank Adaption | Yajie Zhou et.al. | 2505.24773 | null |
2025-05-30 | SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training | Yehonathan Refael et.al. | 2505.24749 | null |
2025-05-30 | Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching | Juan Wisznia et.al. | 2505.24643 | null |
2025-05-30 | LLM Inference Enhanced by External Knowledge: A Survey | Yu-Hsuan Lin et.al. | 2505.24377 | link |
2025-05-30 | SkyLB: A Locality-Aware Cross-Region Load Balancer for LLM Inference | Tian Xia et.al. | 2505.24095 | null |
2025-05-29 | Large Language Model Meets Constraint Propagation | Alexandre Bonlarron et.al. | 2505.24012 | null |
2025-05-29 | EmbAdvisor: Adaptive Cache Management for Sustainable LLM Serving | Yuyang Tian et.al. | 2505.23970 | null |
2025-05-29 | Sustainable Carbon-Aware and Water-Efficient LLM Scheduling in Geo-Distributed Cloud Datacenters | Hayden Moore et.al. | 2505.23554 | null |
2025-06-10 | Ghidorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism | Jinhui Wei et.al. | 2505.23219 | null |
2025-05-29 | SCORPIO: Serving the Right Requests at the Right Time for Heterogeneous SLOs in LLM Inference | Yinghao Tang et.al. | 2505.23022 | null |
2025-05-28 | Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference | Donghyeon Joo et.al. | 2505.22913 | link |
2025-05-28 | AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models | Feng Luo et.al. | 2505.22662 | null |
2025-05-28 | Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR | Mingchen Shao et.al. | 2505.22063 | null |
2025-05-28 | ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning | Zhendong Mi et.al. | 2505.21987 | null |
2025-05-28 | Towards Efficient Key-Value Cache Management for Prefix Prefilling in LLM Inference | Yue Zhu et.al. | 2505.21919 | null |
2025-05-29 | EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse | Tianyu Guo et.al. | 2505.21889 | link |
2025-05-28 | HoliTom: Holistic Token Merging for Fast Video Large Language Models | Kele Shao et.al. | 2505.21334 | link |
2025-06-04 | LLMs Think, But Not In Your Flow: Reasoning-Level Personalization for Black-Box Large Language Models | Jieyong Kim et.al. | 2505.21082 | null |
2025-05-27 | Efficient Large Language Model Inference with Neural Block Linearization | Mete Erdogan et.al. | 2505.21077 | null |
2025-05-28 | FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration | Daehyeon Baek et.al. | 2505.20839 | null |
2025-05-26 | HAMburger: Accelerating LLM Inference via Token Smashing | Jingyu Liu et.al. | 2505.20438 | null |
2025-05-23 | Less Context, Same Performance: A RAG Framework for Resource-Efficient LLM-Based Clinical NLP | Satya Narayana Cheetirala et.al. | 2505.20320 | null |
2025-05-26 | APE: A Data-Centric Benchmark for Efficient LLM Adaptation in Text Summarization | Javier Marín et.al. | 2505.19912 | link |
2025-06-13 | MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE | Zongle Huang et.al. | 2505.19645 | null |
2025-05-26 | VLMLight: Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning | Maonan Wang et.al. | 2505.19486 | null |
2025-05-26 | BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs | Guilong Lu et.al. | 2505.19457 | link |
2025-05-26 | WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference | Sihan Chen et.al. | 2505.19427 | link |
2025-05-25 | DECA: A Near-Core LLM Decompression Accelerator Supporting Out-of-Order Invocation | Gerasimos Gerogiannis et.al. | 2505.19349 | null |
2025-05-25 | Can Large Language Models Infer Causal Relationships from Real-World Text? | Ryan Saklad et.al. | 2505.18931 | null |
2025-06-18 | ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models | Hao Chen et.al. | 2505.18799 | null |
2025-06-01 | A Survey of LLM $\times$ DATA | Xuanhe Zhou et.al. | 2505.18458 | link |
2025-05-23 | LatentLLM: Attention-Aware Joint Tensor Compression | Toshiaki Koike-Akino et.al. | 2505.18413 | null |
2025-05-23 | An Attack to Break Permutation-Based Private Third-Party Inference Schemes for LLMs | Rahul Thomas et.al. | 2505.18332 | null |
2025-05-23 | ELDeR: Getting Efficient LLMs through Data-Driven Regularized Layer-wise Pruning | Mingkuan Feng et.al. | 2505.18232 | null |
2025-05-23 | NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache | Donghyun Son et.al. | 2505.18231 | null |
2025-05-23 | Navigating Pitfalls: Evaluating LLMs in Machine Learning Programming Education | Smitha Kumar et.al. | 2505.18220 | null |
2025-05-23 | Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning | Michael Hassid et.al. | 2505.17813 | null |
2025-05-23 | DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies | Ning Yang et.al. | 2505.17420 | null |
2025-05-26 | RAP: Runtime-Adaptive Pruning for LLM Inference | Huanrong Liu et.al. | 2505.17138 | null |
2025-05-20 | Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency | Ruixiao Li et.al. | 2505.17074 | null |
2025-05-16 | SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs | Jinwoo Park et.al. | 2505.17052 | null |
2025-05-22 | CASTILLO: Characterizing Response Length Distributions of Large Language Models | Daniel F. Perez-Ramirez et.al. | 2505.16881 | link |
2025-05-24 | Recursive Offloading for LLM Serving in Multi-tier Networks | Zhiyuan Wu et.al. | 2505.16502 | link |
2025-05-22 | Reading Between the Prompts: How Stereotypes Shape LLM’s Implicit Personalization | Vera Neplenbroek et.al. | 2505.16467 | link |
2025-05-22 | LightRouter: Towards Efficient LLM Collaboration with Minimal Overhead | Yifan Zhang et.al. | 2505.16221 | null |
2025-05-31 | QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design | Benjamin Schneider et.al. | 2505.16175 | link |
2025-05-22 | KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization | Mingbo Song et.al. | 2505.16162 | null |
2025-05-21 | Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning | Jinghui Lu et.al. | 2505.15154 | null |
2025-05-21 | BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms | Yunlong Hou et.al. | 2505.15141 | null |
2025-06-04 | Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity | Susav Shrestha et.al. | 2505.14884 | link |
2025-05-20 | ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions | Bufang Yang et.al. | 2505.14668 | null |
2025-05-20 | ServerlessLoRA: Minimizing Latency and Cost in Serverless Inference for LoRA-Based LLMs | Yifan Sui et.al. | 2505.14468 | null |
2025-05-20 | Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning | Jiwon Song et.al. | 2505.13866 | link |
2025-05-19 | Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training | Shane Bergsma et.al. | 2505.13738 | null |
2025-05-16 | An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents | Ayesha Amjad et.al. | 2505.13504 | null |
2025-04-02 | Large Language Model powered Symbolic Execution | Yihe Li et.al. | 2505.13452 | null |
2025-05-19 | Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately | Yuhang Wang et.al. | 2505.13326 | null |
2025-05-19 | HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding | Siran Liu et.al. | 2505.13254 | null |
2025-05-19 | FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference | Guangda Liu et.al. | 2505.13109 | null |
2025-05-19 | EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code | Yuhao Qing et.al. | 2505.13004 | link |
2025-05-25 | FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks | Zihua Wang et.al. | 2505.12728 | link |
2025-05-19 | HydraInfer: Hybrid Disaggregated Scheduling for Multimodal Large Language Model Serving | Xianzhe Dong et.al. | 2505.12658 | null |
2025-05-17 | Enhancing Complex Instruction Following for Large Language Models with Mixture-of-Contexts Fine-tuning | Yuheng Lu et.al. | 2505.11922 | null |
2025-05-17 | Arrow: Adaptive Scheduling Mechanisms for Disaggregated LLM Inference Architecture | Yu Wu et.al. | 2505.11916 | null |
2025-05-25 | Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning | Yansong Ning et.al. | 2505.11827 | null |
2025-05-16 | TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference | Raja Gond et.al. | 2505.11329 | link |
2025-05-23 | SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning | Zheng Li et.al. | 2505.11274 | null |
2025-05-16 | Vaiage: A Multi-Agent Solution to Personalized Travel Planning | Binwen Liu et.al. | 2505.10922 | null |
2025-05-21 | SpecOffload: Unlocking Latent GPU Capacity for LLM Inference on Resource-Constrained Devices | Xiangwen Zhuge et.al. | 2505.10259 | link |
2025-06-05 | ServeGen: Workload Characterization and Generation of Large Language Model Serving in Production | Yuxing Xiang et.al. | 2505.09999 | link |
2025-05-15 | How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference | Nidhal Jegham et.al. | 2505.09598 | null |
2025-05-14 | Statistical Modeling and Uncertainty Estimation of LLM Inference Systems | Kaustabha Ray et.al. | 2505.09319 | null |
2025-05-14 | ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor | Seungbeom Choi et.al. | 2505.09142 | null |
2025-05-13 | ITERA-LLM: Boosting Sub-8-Bit Large Language Model Inference via Iterative Tensor Decomposition | Keran Zheng et.al. | 2505.08981 | null |
2025-05-13 | LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries | Zekun Wu et.al. | 2505.08842 | null |
2025-05-13 | Automatic Task Detection and Heterogeneous LLM Speculative Decoding | Danying Ge et.al. | 2505.08600 | null |
2025-05-08 | Scaling Laws for Speculative Decoding | Siyuan Yan et.al. | 2505.07858 | null |
2025-05-12 | SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models | Hang Wu et.al. | 2505.07680 | null |
2025-05-12 | LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning | Xiaotian Lin et.al. | 2505.07437 | link |
2025-05-12 | Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity | Guang Yan et.al. | 2505.07239 | null |
2025-05-12 | PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications | Kuntai Du et.al. | 2505.07203 | null |
2025-06-15 | I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference | Zibo Gao et.al. | 2505.06738 | null |
2025-05-09 | Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference | Haolin Zhang et.al. | 2505.06461 | null |
2025-04-30 | Towards Efficient LLM Storage Reduction via Tensor Deduplication and Delta Compression | Zirui Wang et.al. | 2505.06252 | null |
2025-05-09 | Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM | Zehao Fan et.al. | 2505.05772 | null |
2025-05-08 | PRIMG : Efficient LLM-driven Test Generation Using Mutant Prioritization | Mohamed Salah Bouafif et.al. | 2505.05584 | link |
2025-05-08 | HEXGEN-TEXT2SQL: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL Workflow | You Peng et.al. | 2505.05286 | link |
2025-05-12 | Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving | Shan Yu et.al. | 2505.04021 | null |
2025-05-31 | LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection | Xinyue Zeng et.al. | 2505.03793 | link |
2025-05-15 | GPU Performance Portability needs Autotuning | Burkhard Ringlein et.al. | 2505.03780 | link |
2025-04-21 | Splitwiser: Efficient LM inference with constrained resources | Asad Aali et.al. | 2505.03763 | link |
2025-04-07 | AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design | Yanbiao Liang et.al. | 2505.03745 | null |
2025-05-06 | Faster MoE LLM Inference for Extremely Large Models | Haoqi Yang et.al. | 2505.03531 | null |
2025-05-16 | 34 Examples of LLM Applications in Materials Science and Chemistry: Towards Automation, Assistants, Agents, and Accelerated Scientific Discovery | Yoel Zimmermann et.al. | 2505.03049 | null |
2025-05-05 | RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference | Yaoqi Chen et.al. | 2505.02922 | null |
2025-05-06 | EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices | Arnab Sanyal et.al. | 2505.02380 | null |
2025-05-03 | Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients | Yezhen Wang et.al. | 2505.01744 | null |
2025-05-03 | High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers | Brian Wong et.al. | 2505.01693 | null |
2025-05-08 | A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency | Sihyeong Park et.al. | 2505.01658 | link |
2025-05-02 | PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding | Bradley McDanel et.al. | 2505.01572 | null |
2025-05-01 | Spill The Beans: Exploiting CPU Cache Side-Channels to Leak Tokens from Large Language Models | Andrew Adiletta et.al. | 2505.00817 | null |
2025-04-29 | Efficient LLMs with AMP: Attention Heads and MLP Pruning | Leandro Giusti Mugnaini et.al. | 2504.21174 | null |
2025-04-29 | Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts | Hanhua Hong et.al. | 2504.21117 | null |
2025-04-30 | Ascendra: Dynamic Request Prioritization for Efficient LLM Serving | Azam Ikram et.al. | 2504.20828 | null |
2025-04-30 | GenTorrent: Scaling Large Language Model Serving with An Overley Network | Fei Fang et.al. | 2504.20101 | null |
2025-04-24 | Tempo: Application-aware LLM Serving with Mixed SLO Requirements | Wei Zhang et.al. | 2504.20068 | null |
2025-04-28 | AutoJudge: Judge Decoding Without Manual Annotation | Roman Garipov et.al. | 2504.20039 | null |
2025-04-28 | semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage | Ke Hong et.al. | 2504.19867 | null |
2025-04-28 | Taming the Titans: A Survey of Efficient LLM Inference Serving | Ranran Zhen et.al. | 2504.19720 | link |
2025-04-28 | Bullet: Boosting GPU Utilization for LLM Serving via Dynamic Spatial-Temporal Orchestration | Zejia Lin et.al. | 2504.19516 | null |
2025-04-28 | R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference | Zhenyu Zhang et.al. | 2504.19449 | null |
2025-04-28 | Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory | Prateek Chhikara et.al. | 2504.19413 | null |
2025-05-07 | A Simple Ensemble Strategy for LLM Inference: Towards More Stable Text Classification | Junichiro Niimi et.al. | 2504.18884 | link |
2025-06-15 | PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation | Zihao An et.al. | 2504.18583 | null |
2025-04-25 | EcoServe: Enabling Cost-effective LLM Serving with Proactive Intra- and Inter-Instance Orchestration | Jiangsu Du et.al. | 2504.18154 | null |
2025-04-25 | PropRAG: Guiding Retrieval with Beam Search over Proposition Paths | Jingjin Wang et.al. | 2504.18070 | null |
2025-04-25 | Streaming, Fast and Slow: Cognitive Load-Aware Streaming for Efficient LLM Serving | Chang Xiao et.al. | 2504.17999 | null |
2025-04-24 | Energy Considerations of Large Language Model Inference and Efficiency Optimizations | Jared Fernandez et.al. | 2504.17674 | null |
2025-04-24 | L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference | Qingyuan Liu et.al. | 2504.17584 | null |
2025-04-24 | A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task | Jiaqi Deng et.al. | 2504.17547 | null |
2025-04-24 | On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration | Maoyang Xiang et.al. | 2504.17376 | null |
2025-04-26 | QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining | Fengze Liu et.al. | 2504.16511 | null |
2025-04-18 | HPU: High-Bandwidth Processing Unit for Scalable, Cost-effective LLM Inference via GPU Co-processing | Myunghyun Rhee et.al. | 2504.16112 | null |
2025-05-29 | Optimizing Token Consumption in LLMs: A Nano Surge Approach for Code Reasoning Efficiency | Junwei Hu et.al. | 2504.15989 | null |
2025-04-22 | SeaLLM: Service-Aware and Latency-Optimized Resource Sharing for Large Language Model Inference | Yihao Zhao et.al. | 2504.15720 | null |
2025-04-23 | A LoRA-Based Approach to Fine-Tuning LLMs for Educational Guidance in Resource-Constrained Settings | Md Millat Hosen et.al. | 2504.15610 | link |
2025-04-21 | Speculative Sampling via Exponential Races | Szymon Kobus et.al. | 2504.15475 | null |
2025-05-20 | KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments | Junyoung Park et.al. | 2504.15364 | null |
2025-04-18 | High-Throughput LLM inference on Heterogeneous Clusters | Yi Xiong et.al. | 2504.15303 | null |
2025-04-17 | D $^{2}$ MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving | Haodong Wang et.al. | 2504.15299 | null |
2025-06-12 | SLO-Aware Scheduling for Large Language Model Inferences | Jinqi Huang et.al. | 2504.14966 | null |
2025-04-21 | Hardware-based Heterogeneous Memory Management for Large Language Model Inference | Soojin Hwang et.al. | 2504.14893 | null |
2025-05-28 | gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling | Tianyu Guo et.al. | 2504.14775 | link |
2025-04-20 | Knowledge Distillation and Dataset Distillation of Large Language Models: Emerging Trends, Challenges, and Future Directions | Luyang Fang et.al. | 2504.14772 | null |
2025-04-22 | Optimizing SLO-oriented LLM Serving with PD-Multiplexing | Weihao Cui et.al. | 2504.14489 | null |
2025-04-19 | Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator | Akshat Ramachandran et.al. | 2504.14365 | null |
2025-04-19 | FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference | Coleman Hooper et.al. | 2504.14152 | null |
2025-05-12 | From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs | Jiliang Ni et.al. | 2504.13471 | null |
2025-05-23 | The Quantum LLM: Modeling Semantic Spaces with Quantum Principles | Timo Aukusti Laine et.al. | 2504.13202 | null |
2025-04-25 | Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving | Yaoyao Ding et.al. | 2504.12984 | null |
2025-04-17 | Data-efficient LLM Fine-tuning for Code Generation | Weijie Lv et.al. | 2504.12687 | link |
2025-04-16 | Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading | Kihyun Kim et.al. | 2504.11816 | link |
2025-04-16 | Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs | Hyungwoo Lee et.al. | 2504.11765 | null |
2025-04-16 | Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures | Prabhu Vellaisamy et.al. | 2504.11750 | null |
2025-04-16 | Progent: Programmable Privilege Control for LLM Agents | Tianneng Shi et.al. | 2504.11703 | link |
2025-04-15 | Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints | Ruicheng Ao et.al. | 2504.11320 | link |
2025-04-14 | HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving | Avinash Kumar et.al. | 2504.10724 | null |
2025-04-14 | Load Balancing with Network Latencies via Distributed Gradient Descent | Santiago R. Balseiro et.al. | 2504.10693 | null |
2025-04-14 | AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference | Yangshen Deng et.al. | 2504.10326 | null |
2025-04-14 | KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference | Yuxuan Tian et.al. | 2504.09936 | null |
2025-04-20 | Understanding and Optimizing Multi-Stage AI Inference Pipelines | Abhimanyu Rajeshkumar Bambhaniya et.al. | 2504.09775 | null |
2025-04-13 | Integrating Large Language Models for Automated Structural Analysis | Haoran Liang et.al. | 2504.09754 | null |
2025-04-13 | Efficient LLM Serving on Hybrid Real-time and Best-effort Requests | Wan Borui et.al. | 2504.09590 | null |
2025-04-13 | LoopLynx: A Scalable Dataflow Architecture for Efficient LLM Inference | Jianing Zheng et.al. | 2504.09561 | link |
2025-04-12 | MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints | Yichao Yuan et.al. | 2504.09345 | null |
2025-05-22 | DynaServe: Unified and Elastic Execution for Dynamic Disaggregated LLM Serving | Chaoyi Ruan et.al. | 2504.09285 | null |
2025-04-11 | An Adaptive Vector Index Partitioning Scheme for Low-Latency RAG Pipeline | Junkyum Kim et.al. | 2504.08930 | null |
2025-04-11 | SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting | Jiaming Xu et.al. | 2504.08850 | null |
2025-05-31 | SD $^2$ : Self-Distilled Sparse Drafters | Mike Lasby et.al. | 2504.08838 | null |
2025-04-07 | PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters | Zonghang Li et.al. | 2504.08791 | link |
2025-04-11 | Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash | Fucheng Jia et.al. | 2504.08378 | null |
2025-04-11 | Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices | Shengyuan Ye et.al. | 2504.08242 | null |
2025-04-10 | Token Level Routing Inference System for Edge Devices | Jianshu She et.al. | 2504.07878 | null |
2025-04-10 | A System for Comprehensive Assessment of RAG Frameworks | Mattia Rengo et.al. | 2504.07803 | link |
2025-04-10 | Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving | Shihong Gao et.al. | 2504.07494 | link |
2025-04-10 | UniCAIM: A Unified CAM/CIM Architecture with Static-Dynamic KV Cache Pruning for Efficient Long-Context LLM Inference | Weikai Xu et.al. | 2504.07479 | null |
2025-04-24 | Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents | Yueying Li et.al. | 2504.07347 | null |
2025-04-08 | S’MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning | Hanqing Zeng et.al. | 2504.06426 | null |
2025-04-08 | SPIRe: Boosting LLM Inference Throughput with Speculative Decoding | Sanjit Neelam et.al. | 2504.06419 | null |
2025-04-08 | Mosaic: Composite Projection Pruning for Resource-efficient LLMs | Bailey J. Eccles et.al. | 2504.06323 | null |
2025-04-08 | Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching | Yanhao Dong et.al. | 2504.06319 | null |
2025-05-23 | Hogwild! Inference: Parallel LLM Generation via Concurrent Attention | Gleb Rodionov et.al. | 2504.06261 | null |
2025-05-27 | User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems | Jianling Wang et.al. | 2504.05522 | null |
2025-04-07 | REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding | Sakib Reza et.al. | 2504.05491 | null |
2025-04-07 | Evaluating Knowledge Graph Based Retrieval Augmented Generation Methods under Knowledge Incompleteness | Dongzhuoran Zhou et.al. | 2504.05163 | null |
2025-05-20 | Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning | Sugyeong Eo et.al. | 2504.05047 | null |
2025-04-05 | PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models | Haofei Yin et.al. | 2504.04104 | null |
2025-04-03 | FlowKV: A Disaggregated Inference Framework with Low-Latency KV Cache Transfer and Load-Aware Scheduling | Weiqing Li et.al. | 2504.03775 | null |
2025-03-30 | VFlow: Discovering Optimal Agentic Workflows for Verilog Generation | Yangbo Wei et.al. | 2504.03723 | null |
2025-04-08 | MILLION: Mastering Long-Context LLM Inference Via Outlier-Immunized KV Product Quantization | Zongwu Wang et.al. | 2504.03661 | link |
2025-03-01 | Echo: Efficient Co-Scheduling of Hybrid Online-Offline Tasks for Large Language Model Serving | Zhibin Wang et.al. | 2504.03651 | null |
2025-02-22 | AIBrix: Towards Scalable, Cost-Effective Large Language Model Inference Infrastructure | The AIBrix Team et.al. | 2504.03648 | null |
2025-04-04 | Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency | Erik Johannes Husom et.al. | 2504.03360 | null |
2025-04-04 | Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation | Weitao Li et.al. | 2504.03165 | link |
2025-04-03 | Narrative Studio: Visual narrative exploration using LLMs and Monte Carlo Tree Search | Parsa Ghaffari et.al. | 2504.02426 | link |
2025-04-01 | SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching | Yuxuan Zhu et.al. | 2504.00970 | null |
2025-06-04 | Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding | Aayush Gautam et.al. | 2504.00030 | null |
2025-03-31 | TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers’ Guidance | Jingxian Xu et.al. | 2503.24198 | null |
2025-04-06 | ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance | Tong Xie et.al. | 2503.24053 | link |
2025-03-31 | Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving | Wei Gao et.al. | 2503.24000 | link |
2025-03-31 | Model Hemorrhage and the Robustness Limits of Large Language Models | Ziyang Ma et.al. | 2503.23924 | null |
2025-03-31 | MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration | Tatsuya Kubo et.al. | 2503.23817 | null |
2025-03-30 | Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference | Wei Tao et.al. | 2503.23294 | null |
2025-03-30 | PromptDistill: Query-based Selective Token Retention in Intermediate Layers for Efficient Large Language Model Inference | Weisheng Jin et.al. | 2503.23274 | link |
2025-03-28 | Niyama : Breaking the Silos of LLM Inference Serving | Kanishk Goel et.al. | 2503.22562 | null |
2025-03-26 | Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation | Yunkai Liang et.al. | 2503.20552 | link |
2025-03-25 | LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation | Han Chen et.al. | 2503.19950 | link |
2025-03-24 | LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment | Varsha Embar et.al. | 2503.19090 | null |
2025-03-23 | SplitFrozen: Split Learning with Device-side Model Frozen for Fine-Tuning LLM on Heterogeneous Resource-Constrained Devices | Jian Ma et.al. | 2503.18986 | null |
2025-03-24 | xKV: Cross-Layer SVD for KV-Cache Compression | Chi-Chih Chang et.al. | 2503.18893 | link |
2025-04-21 | Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design | Rui Xie et.al. | 2503.18869 | null |
2025-05-14 | Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization | Minsu Kim et.al. | 2503.18599 | null |
2025-03-24 | DeepFund: Will LLM be Professional at Fund Investment? A Live Arena Perspective | Changlun Li et.al. | 2503.18313 | null |
2025-03-24 | Jenga: Effective Memory Management for Serving LLM with Heterogeneity | Chen Zhang et.al. | 2503.18292 | null |
2025-03-27 | WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference | Youhui Zuo et.al. | 2503.17922 | link |
2025-03-22 | PipeBoost: Resilient Pipelined Architecture for Fast Serverless LLM Scaling | Chongpeng Liu et.al. | 2503.17707 | null |
2025-03-21 | V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms | Javier J. Poveda Rodrigo et.al. | 2503.17422 | null |
2025-03-21 | Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation | Jingzhi Fang et.al. | 2503.16893 | null |
2025-05-16 | KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse | Huan Yang et.al. | 2503.16525 | null |
2025-03-20 | SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models | Fahao Chen et.al. | 2503.15921 | null |
2025-03-19 | Automated Non-Functional Requirements Generation in Software Engineering with Large Language Models: A Comparative Study | Jomar Thomas Almonte et.al. | 2503.15248 | null |
2025-04-15 | ELTEX: A Framework for Domain-Driven Synthetic Data Generation | Arina Razmyslovich et.al. | 2503.15055 | link |
2025-03-19 | FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding | Chongjun Tu et.al. | 2503.14935 | null |
2025-03-19 | Communication-Efficient Distributed On-Device LLM Inference Over Wireless Networks | Kai Zhang et.al. | 2503.14882 | null |
2025-03-21 | RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving | Wenqi Jiang et.al. | 2503.14649 | null |
2025-03-18 | PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play | Wei Fang et.al. | 2503.14432 | null |
2025-03-24 | Mitigating KV Cache Competition to Enhance User Experience in LLM Inference | Haiying Shen et.al. | 2503.13773 | null |
2025-03-17 | AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications | Haiying Shen et.al. | 2503.13737 | null |
2025-03-17 | ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts | Evangelos Georganas et.al. | 2503.13565 | null |
2025-03-14 | Examples as the Prompt: A Scalable Approach for Efficient LLM Adaptation in E-Commerce | Jingying Zeng et.al. | 2503.13518 | null |
2025-03-17 | xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference | Maximilian Beck et.al. | 2503.13427 | link |
2025-04-14 | VeriLeaky: Navigating IP Protection vs Utility in Fine-Tuning for LLM-Driven Verilog Coding | Zeng Wang et.al. | 2503.13116 | null |
2025-03-15 | TFHE-Coder: Evaluating LLM-agentic Fully Homomorphic Encryption Code Generation | Mayank Kumar et.al. | 2503.12217 | null |
2025-04-22 | Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques | Neusha Javidnia et.al. | 2503.11816 | null |
2025-05-19 | D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning | Jia Zhang et.al. | 2503.11441 | null |
2025-03-14 | MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens | Jeong Hun Yeo et.al. | 2503.11315 | link |
2025-04-08 | Green Prompting | Marta Adamska et.al. | 2503.10666 | null |
2025-05-15 | Collaborative Speculative Inference for Efficient LLM Inference Serving | Luyao Gao et.al. | 2503.10325 | null |
2025-03-17 | Exploiting Edited Large Language Models as General Scientific Optimizers | Qitan Lv et.al. | 2503.09620 | null |
2025-03-13 | BIMBA: Selective-Scan Compression for Long-Range Video Question Answering | Md Mohaiminul Islam et.al. | 2503.09590 | link |
2025-05-23 | Prompt Inference Attack on Distributed Large Language Model Inference Frameworks | Xinjian Luo et.al. | 2503.09291 | null |
2025-05-02 | Prompt Inversion Attack against Collaborative Inference of Large Language Models | Wenjie Qu et.al. | 2503.09022 | null |
2025-03-19 | Enhancing High-Quality Code Generation in Large Language Models with Comparative Prefix-Tuning | Yuan Jiang et.al. | 2503.09020 | link |
2025-03-11 | Position-Aware Depth Decay Decoding ( $D^3$ ): Boosting Large Language Model Inference Efficiency | Siqi Fan et.al. | 2503.08524 | null |
2025-03-11 | FastCache: Optimizing Multimodal LLM Serving through Lightweight KV-Cache Compression Framework | Jianian Zhu et.al. | 2503.08461 | null |
2025-03-19 | TokenSim: Enabling Hardware and Software Exploration for Large Language Model Inference Systems | Feiyang Wu et.al. | 2503.08415 | link |
2025-03-11 | Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference | Pol G. Recasens et.al. | 2503.08311 | null |
2025-03-09 | Seesaw: High-throughput LLM Inference via Model Re-sharding | Qidong Su et.al. | 2503.06433 | null |
2025-02-24 | Encoding Inequity: Examining Demographic Bias in LLM-Driven Robot Caregiving | Raj Korpan et.al. | 2503.05765 | null |
2025-03-07 | Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching | Bowen Pang et.al. | 2503.05248 | link |
2025-05-21 | Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching | Simon A. Aytes et.al. | 2503.05179 | link |
2025-03-07 | SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding | Kaiyu Huang et.al. | 2503.05096 | null |
2025-03-07 | Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size | Alireza Behtash et.al. | 2503.04704 | null |
2025-03-15 | Mark Your LLM: Detecting the Misuse of Open-Source Large Language Models via Watermarking | Yijie Xu et.al. | 2503.04636 | null |
2025-03-06 | AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services | Xiaoqi Wang et.al. | 2503.04418 | null |
2025-03-06 | Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search | Kou Misaki et.al. | 2503.04412 | null |
2025-03-06 | ADOR: A Design Exploration Framework for LLM Serving with Enhanced Latency and Throughput | Junsoo Kim et.al. | 2503.04253 | null |
2025-03-06 | Beyond Memorization: Evaluating the True Type Inference Capabilities of LLMs for Java Code Snippets | Yiwen Dong et.al. | 2503.04076 | null |
2025-03-04 | FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference | Hongchao Du et.al. | 2503.03777 | null |
2025-03-05 | MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems | Rui Ye et.al. | 2503.03686 | null |
2025-03-05 | Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems | Yaoru Li et.al. | 2503.03505 | link |
2025-03-05 | Enhancing Memory Efficiency in Large Language Model Training Through Chronos-aware Pipeline Parallelism | Xinyuan Lin et.al. | 2503.03182 | null |
2025-03-04 | PersonaX: A Recommendation Agent Oriented User Modeling Framework for Long Behavior Sequence | Yunxiao Shi et.al. | 2503.02398 | link |
2025-03-04 | VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference | Zihan Liu et.al. | 2503.02236 | null |
2025-02-26 | Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis | Long Cheng et.al. | 2503.01873 | null |
2025-04-30 | SAGE: A Framework of Precise Retrieval for RAG | Jintao Zhang et.al. | 2503.01713 | null |
2025-03-03 | Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens | Xinsheng Wang et.al. | 2503.01710 | link |
2025-03-03 | DILEMMA: Joint LLM Quantization and Distributed LLM Inference Over Edge Computing Systems | Minoo Hosseinzadeh et.al. | 2503.01704 | null |
2025-03-15 | Towards An Efficient LLM Training Paradigm for CTR Prediction | Allen Lin et.al. | 2503.01001 | null |
2025-03-02 | Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers | Yiran Zhao et.al. | 2503.00865 | null |
2025-03-01 | Tutorial Proposal: Speculative Decoding for Efficient LLM Inference | Heming Xia et.al. | 2503.00491 | null |
2025-03-01 | Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving | Qihui Zhou et.al. | 2503.00392 | null |
2025-02-28 | FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference | Xunhao Lai et.al. | 2502.20766 | link |
2025-05-04 | SPD: Sync-Point Drop for efficient tensor parallelism of Large Language Models | Han-Byul Kim et.al. | 2502.20727 | null |
2025-04-02 | Smart Routing: Cost-Effective Multi-LLM Serving for Multi-Core AIOS | Kai Mei et.al. | 2502.20576 | link |
2025-02-27 | M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging | Jinghao Feng et.al. | 2502.20301 | null |
2025-02-26 | Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs | Yiheng Yang et.al. | 2502.19078 | null |
2025-02-26 | Evidence-Driven Marker Extraction for Social Media Suicide Risk Detection | Carter Adams et.al. | 2502.18823 | null |
2025-02-24 | LLM Inference Acceleration via Efficient Operation Fusion | Mahsa Salmani et.al. | 2502.17728 | null |
2025-02-24 | CodeSwift: Accelerating LLM Inference for Efficient Code Generation | Qianhui Zhao et.al. | 2502.17139 | null |
2025-02-24 | Make LLM Inference Affordable to Everyone: Augmenting GPU Memory with NDP-DIMM | Lian Liu et.al. | 2502.16963 | null |
2025-02-24 | DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance | Xuanfan Ni et.al. | 2502.16886 | null |
2025-03-01 | CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter | Yepeng Weng et.al. | 2502.16880 | null |
2025-02-23 | DISC: Dynamic Decomposition Improves LLM Inference Scaling | Jonathan Light et.al. | 2502.16706 | null |
2025-02-23 | Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines | Xinwei Long et.al. | 2502.16641 | null |
2025-05-01 | TerEffic: Highly Efficient Ternary LLM Inference on FPGA | Chenyang Yin et.al. | 2502.16473 | null |
2025-02-27 | Dynamic Parallel Tree Search for Efficient LLM Reasoning | Yifu Ding et.al. | 2502.16235 | null |
2025-02-21 | KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse | Jingbo Yang et.al. | 2502.16002 | link |
2025-02-14 | Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization | Bowen Pang et.al. | 2502.15763 | null |
2025-02-21 | Towards Swift Serverless LLM Cold Starts with ParaServe | Chiheng Lou et.al. | 2502.15524 | null |
2025-02-24 | HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings | Rasmus Aavang et.al. | 2502.15411 | link |
2025-02-24 | Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference | Yaohua Tang et.al. | 2502.15294 | null |
2025-02-21 | A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation | Shilong Hou et.al. | 2502.15233 | link |
2025-02-19 | EvoP: Robust LLM Inference via Evolutionary Pruning | Shangyu Wu et.al. | 2502.14910 | null |
2025-04-21 | LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention | Shang Yang et.al. | 2502.14866 | link |
2025-02-20 | Serving Models, Fast and Slow:Optimizing Heterogeneous LLM Inferencing Workloads at Scale | Shashwat Jaiswal et.al. | 2502.14617 | null |
2025-02-20 | SR-LLM: Rethinking the Structured Representation in Large Language Model | Jiahuan Zhang et.al. | 2502.14352 | null |
2025-02-20 | Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications | Kayhan Behdin et.al. | 2502.14305 | null |
2025-02-19 | RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression | Payman Behnam et.al. | 2502.14051 | null |
2025-02-19 | Autellix: An Efficient Serving Engine for LLM Agents as General Programs | Michael Luo et.al. | 2502.13965 | null |
2025-02-19 | Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference | Qingfa Xiao et.al. | 2502.13542 | null |
2025-02-19 | What are Models Thinking about? Understanding Large Language Model Hallucinations “Psychology” through Model Inner State Analysis | Peiran Wang et.al. | 2502.13490 | null |
2025-02-24 | BaKlaVa – Budgeted Allocation of KV cache for Long-context Inference | Ahmed Burak Gulhan et.al. | 2502.13176 | null |
2025-02-18 | SEFL: Harnessing Large Language Model Agents to Improve Educational Feedback Systems | Mike Zhang et.al. | 2502.12927 | link |
2025-03-27 | R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs | Sumin Jo et.al. | 2502.12767 | link |
2025-02-18 | HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading | Cheng Luo et.al. | 2502.12574 | link |
2025-02-18 | Distributed On-Device LLM Inference With Over-the-Air Computation | Kai Zhang et.al. | 2502.12559 | null |
2025-02-18 | SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs | Ahmed F. AbouElhamayed et.al. | 2502.12444 | link |
2025-02-17 | Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs | Kan Zhu et.al. | 2502.12216 | null |
2025-02-17 | Designing Role Vectors to Improve LLM Inference Behaviour | Daniele Potertì et.al. | 2502.12055 | null |
2025-02-17 | DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services | Ting Sun et.al. | 2502.11417 | null |
2025-02-17 | Evaluating the Performance of the DeepSeek Model in Confidential Computing Environment | Ben Dong et.al. | 2502.11347 | null |
2025-02-16 | Unveiling Environmental Impacts of Large Language Model Serving: A Functional Unit View | Yanran Wu et.al. | 2502.11256 | null |
2025-02-16 | Diversified Sampling Improves Scaling LLM inference | Tianchun Wang et.al. | 2502.11027 | null |
2025-02-16 | Leveraging Uncertainty Estimation for Efficient LLM Routing | Tuo Zhang et.al. | 2502.11021 | null |
2025-04-07 | Local-Cloud Inference Offloading for LLMs in Multi-Modal, Multi-Task, Multi-Dialogue Settings | Liangqi Yuan et.al. | 2502.11007 | link |
2025-02-15 | Pushing up to the Limit of Memory Bandwidth and Capacity Utilization for Efficient LLM Decoding on Embedded FPGA | Jindong Li et.al. | 2502.10659 | null |
2025-02-05 | QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache | Rishabh Tiwari et.al. | 2502.10424 | null |
2025-02-14 | λScale: Enabling Fast Scaling for Serverless Large Language Model Inference | Minchen Yu et.al. | 2502.09922 | null |
2025-02-14 | INF^2: High-Throughput Generative Inference of Large Language Models using Near-Storage Processing | Hongsun Jang et.al. | 2502.09921 | null |
2025-02-13 | On multi-token prediction for efficient LLM inference | Somesh Mehra et.al. | 2502.09419 | null |
2025-02-13 | ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments | Youhe Jiang et.al. | 2502.09334 | null |
2025-03-21 | RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models | Quan Wei et.al. | 2502.09003 | null |
2025-02-13 | InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU | Heejun Lee et.al. | 2502.08910 | null |
2025-02-13 | DiffoRA: Enabling Parameter-Efficient LLM Fine-Tuning via Differential Low-Rank Matrix Adaptation | Tangyu Jiang et.al. | 2502.08905 | null |
2025-02-12 | Universal Model Routing for Efficient LLM Inference | Wittawat Jitkrittum et.al. | 2502.08773 | null |
2025-02-12 | MoLoRec: A Generalizable and Efficient Framework for LLM-Based Recommendation | Min Hou et.al. | 2502.08271 | null |
2025-02-12 | Memory Offloading for Large Language Model Inference with Latency SLO Guarantees | Chenxiang Ma et.al. | 2502.08182 | null |
2025-02-12 | Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences | Shanshan Han et.al. | 2502.08142 | null |
2025-03-19 | Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding | Ziyao Wang et.al. | 2502.08020 | null |
2025-02-11 | HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment | Youhe Jiang et.al. | 2502.07903 | null |
2025-02-11 | SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters | Yiping Wang et.al. | 2502.07832 | null |
2025-03-21 | PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference | Yufeng Gu et.al. | 2502.07578 | link |
2025-03-05 | Online Scheduling for LLM Inference with KV Cache Constraints | Patrick Jaillet et.al. | 2502.07115 | null |
2025-02-10 | Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE | Haiduo Huang et.al. | 2502.06282 | link |
2025-03-15 | Towards Sustainable NLP: Insights from Benchmarking Inference Energy in Large Language Models | Soham Poddar et.al. | 2502.05610 | null |
2025-02-08 | Mechanistic Interpretability of Emotion Inference in Large Language Models | Ala N. Tak et.al. | 2502.05489 | null |
2025-02-07 | BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference | Reena Elangovan et.al. | 2502.05376 | null |
2025-01-31 | Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies | Nadav Timor et.al. | 2502.05202 | null |
2025-03-15 | EcoServe: Designing Carbon-Aware AI Inference Systems | Yueying Li et.al. | 2502.05043 | null |
2025-02-07 | LLM Query Scheduling with Prefix Reuse and Latency Constraints | Gregory Dexter et.al. | 2502.04677 | null |
2025-02-18 | WaferLLM: A Wafer-Scale LLM Inference System | Congjie He et.al. | 2502.04563 | null |
2025-02-25 | KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference | Xing Li et.al. | 2502.04420 | link |
2025-02-06 | CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference | Zehua Pei et.al. | 2502.04416 | link |
2025-02-11 | Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing | Kunfeng Lai et.al. | 2502.04411 | null |
2025-02-26 | AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference | Qingyue Yang et.al. | 2502.04077 | link |
2025-02-06 | CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing | Yu Yuan et.al. | 2502.03997 | null |
2025-02-06 | Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective | Yuan Feng et.al. | 2502.03805 | link |
2025-04-04 | Adaptive Semantic Prompt Caching with VectorQ | Luis Gaspar Schroeder et.al. | 2502.03771 | null |
2025-02-05 | Bilevel ZOFO: Bridging Parameter-Efficient and Zeroth-Order Techniques for Efficient LLM Fine-Tuning and Meta-Training | Reza Shirkavand et.al. | 2502.03604 | null |
2025-02-05 | HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference | Zeyu Zhang et.al. | 2502.03589 | null |
2025-02-05 | Accessible and Portable LLM Inference by Compiling Computational Graphs into SQL | Wenbo Sun et.al. | 2502.02818 | null |
2025-02-05 | Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation | Jingyu Liu et.al. | 2502.02789 | link |
2025-02-04 | LLM Bandit: Cost-Efficient LLM Generation via Preference-Conditioned Dynamic Routing | Yang Li et.al. | 2502.02743 | null |
2025-02-04 | EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization | Yize Wu et.al. | 2502.02493 | null |
2025-01-30 | Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency | Sazzad Hossain et.al. | 2502.01651 | null |
2025-02-06 | An Investigation of FP8 Across Accelerators for LLM Inference | Jiwoo Kim et.al. | 2502.01070 | null |
2025-02-02 | Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference | Patrick Yubeaton et.al. | 2502.00922 | null |
2025-02-02 | MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies | Ehsaneddin Asgari et.al. | 2502.00894 | null |
2025-02-02 | SecPE: Secure Prompt Ensembling for Private and Robust Large Language Models | Jiawen Zhang et.al. | 2502.00847 | null |
2025-02-02 | Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs | Youhe Jiang et.al. | 2502.00722 | null |
2025-02-13 | Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning | Zhi Zhou et.al. | 2502.00511 | null |
2025-02-01 | UniAttn: Reducing Inference Costs via Softmax Unification for Post-Training LLMs | Yizhe Xiong et.al. | 2502.00439 | null |
2025-02-01 | ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference | Xiang Liu et.al. | 2502.00299 | null |
2025-01-16 | Optimization Strategies for Enhancing Resource Efficiency in Transformers & Large Language Models | Tom Wallace et.al. | 2502.00046 | null |
2025-02-07 | Pushing the Limits of BFP on Narrow Precision LLM Inference | Hui Wang et.al. | 2502.00026 | null |
2025-02-14 | Reward-Guided Speculative Decoding for Efficient LLM Reasoning | Baohao Liao et.al. | 2501.19324 | null |
2025-01-31 | Pheromone-based Learning of Optimal Reasoning Paths | Anirudh Chari et.al. | 2501.19278 | null |
2025-01-31 | Structural Embedding Projection for Contextual Large Language Model Inference | Vincent Enoasmo et.al. | 2501.18826 | null |
2025-01-29 | On the Partitioning of GPU Power among Multi-Instances | Tirth Vamja et.al. | 2501.17752 | null |
2025-02-02 | RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations | Zunhai Su et.al. | 2501.16383 | null |
2025-01-27 | Raiders of the Lost Dependency: Fixing Dependency Conflicts in Python using LLMs | Antony Bartlett et.al. | 2501.16191 | null |
2025-01-27 | TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference | Jack Min Ong et.al. | 2501.16007 | null |
2025-01-27 | Aging-aware CPU Core Management for Embodied Carbon Amortization in Cloud LLM Inference | Tharindu B. Hewage et.al. | 2501.15829 | link |
2025-01-25 | Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads | Xingyang He et.al. | 2501.15113 | null |
2025-01-25 | PatchRec: Multi-Grained Patching for Efficient LLM-based Sequential Recommendation | Jiayi Liao et.al. | 2501.15087 | null |
2025-02-09 | HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location | Ting Sun et.al. | 2501.14808 | null |
2025-01-11 | HeteroLLM: Accelerating Large Language Model Inference on Mobile SoCs platform with Heterogeneous AI Accelerators | Le Chen et.al. | 2501.14794 | null |
2025-01-04 | DeServe: Towards Affordable Offline LLM Inference via Decentralization | Linyu Wu et.al. | 2501.14784 | null |
2024-12-13 | KVDirect: Distributed Disaggregated LLM Inference | Shiyang Chen et.al. | 2501.14743 | null |
2025-01-24 | Accelerated Preference Elicitation with LLM-Based Proxies | David Huang et.al. | 2501.14625 | null |
2025-01-27 | DeepFlow: Serverless Large Language Model Serving at Scale | Junhao Hu et.al. | 2501.14417 | null |
2025-01-24 | Locality-aware Fair Scheduling in LLM Serving | Shiyi Cao et.al. | 2501.14312 | null |
2025-01-24 | Serving Long-Context LLMs at the Mobile Edge: Test-Time Reinforcement Learning-based Model Caching and Inference Offloading | Minrui Xu et.al. | 2501.14205 | null |
2025-01-08 | iServe: An Intent-based Serving System for LLMs | Dimitrios Liakopoulos et.al. | 2501.13111 | null |
2025-01-24 | EchoLM: Accelerating LLM Serving with Real-time Knowledge Distillation | Yifan Yu et.al. | 2501.12689 | null |
2025-03-16 | Human-like conceptual representations emerge from language prediction | Ningyu Xu et.al. | 2501.12547 | null |
2025-01-21 | AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding | Zikun Li et.al. | 2501.12162 | null |
2025-02-11 | Glinthawk: A Two-Tiered Architecture for Offline LLM Inference | Pouya Hamadanian et.al. | 2501.11779 | link |
2025-01-20 | Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas | Nishant Balepur et.al. | 2501.11549 | link |
2025-03-21 | GREEN-CODE: Learning to Optimize Energy Efficiency in LLM-based Code Generation | Shashikant Ilager et.al. | 2501.11006 | link |
2025-03-06 | A Survey on LLM Test-Time Compute via Search: Tasks, LLM Profiling, Search Algorithms, and Relevant Frameworks | Xinzhe Li et.al. | 2501.10069 | link |
2025-01-16 | PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks | Huiyou Zhan et.al. | 2501.09367 | null |
2025-01-16 | Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition | Takaaki Hori et.al. | 2501.09258 | null |
2025-01-16 | Split Fine-Tuning for Large Language Models in Wireless Networks | Songge Zhang et.al. | 2501.09237 | null |
2025-01-15 | Guiding Retrieval using LLM-based Listwise Rankers | Mandeep Rathee et.al. | 2501.09186 | link |
2025-01-14 | Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings | Paul Joe Maliakel et.al. | 2501.08219 | null |
2025-01-14 | PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving | Ahmet Caner Yüzügüler et.al. | 2501.08192 | null |
2025-01-14 | Hierarchical Autoscaling for Large Language Model Serving with Chiron | Archit Patke et.al. | 2501.08090 | null |
2025-01-12 | MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference | Wenxuan Zeng et.al. | 2501.06807 | null |
2025-01-12 | Mell: Memory-Efficient Large Language Model Serving via Multi-GPU KV Cache Management | Liu Qianli et.al. | 2501.06709 | null |
2025-02-07 | Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping | Muru Zhang et.al. | 2501.06589 | link |
2025-01-15 | Multimodal-to-Text Prompt Engineering in Large Language Models Using Feature Embeddings for GNSS Interference Characterization | Harshith Manjunath et.al. | 2501.05079 | null |
2025-02-08 | Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text | Ali Al-Lawati et.al. | 2501.03166 | link |
2025-01-05 | TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms | Jovan Stojkovic et.al. | 2501.02600 | null |
2025-01-04 | AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference | Zhuomin He et.al. | 2501.02336 | link |
2024-12-31 | Towards Sustainable Large Language Model Serving | Sophia Nguyen et.al. | 2501.01990 | null |
2025-01-03 | Efficient LLM Inference with Activation Checkpointing and Hybrid Caching | Sanghyeon Lee et.al. | 2501.01792 | null |
2025-01-03 | (WhyPHI) Fine-Tuning PHI-3 for Multiple-Choice Question Answering: Methodology, Results, and Challenges | Mohamed Hisham Abdellatif et.al. | 2501.01588 | null |
2025-01-21 | BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference | Wonsuk Jang et.al. | 2501.01144 | link |
2025-01-02 | FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving | Zihao Ye et.al. | 2501.01005 | link |
2025-02-25 | Rethinking Layer Removal: A Hybrid Pruning Framework Combining Layer Removal and Singular Value Selection for Efficient LLM Compression | Kainan Liu et.al. | 2501.00339 | null |
2024-12-23 | Highly Optimized Kernels and Fine-Grained Codebooks for LLM Inference on Arm CPUs | Dibakar Gope et.al. | 2501.00032 | link |
2024-12-29 | TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication | Zongwu Wang et.al. | 2412.20501 | link |
2024-12-29 | GreenLLM: Disaggregating Large Language Model Serving on Heterogeneous GPUs for Lower Carbon Emissions | Tianyao Shi et.al. | 2412.20322 | null |
2025-01-15 | LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System | Hyucksung Kwon et.al. | 2412.20166 | null |
2024-12-19 | GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors | Chengming Zhang et.al. | 2412.19829 | null |
2025-01-05 | Gradient Weight-normalized Low-rank Projection for Efficient LLM Training | Jia-Hong Huang et.al. | 2412.19616 | link |
2025-01-02 | A Survey on Large Language Model Acceleration based on KV Cache Management | Haoyang Li et.al. | 2412.19442 | link |
2025-02-13 | An Engorgio Prompt Makes Large Language Model Babble on | Jianshuo Dong et.al. | 2412.19394 | link |
2024-12-25 | Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference | Libo Zhang et.al. | 2412.18934 | null |
2024-12-24 | TimelyLLM: Segmented LLM Serving System for Time-sensitive Robotic Applications | Neiwen Ling et.al. | 2412.18695 | null |
2024-12-26 | KunServe: Elastic and Efficient Large Language Model Serving with Parameter-centric Memory Management | Rongxin Cheng et.al. | 2412.18169 | null |
2025-02-22 | Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media | Zhen Sun et.al. | 2412.18148 | null |
2024-12-24 | Tackling the Dynamicity in a Production LLM Serving System with SOTA Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient Meta-kernels | Mingcong Song et.al. | 2412.18106 | null |
2024-12-23 | Trustworthy and Efficient LLMs Meet Databases | Kyoungmin Kim et.al. | 2412.18022 | null |
2025-02-20 | GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference | Chao Zeng et.al. | 2412.17560 | null |
2025-02-18 | VilBias: A Study of Bias Detection through Linguistic and Visual Cues , presenting Annotation Strategies, Evaluation, and Key Challenges | Shaina Raza et.al. | 2412.17052 | link |
2024-12-21 | SYMPHONY: Improving Memory Management for LLM Inference Workloads | Saurabh Agarwal et.al. | 2412.16434 | null |
2024-12-20 | WebLLM: A High-Performance In-Browser LLM Inference Engine | Charlie F. Ruan et.al. | 2412.15803 | link |
2024-12-19 | Fietje: An open, efficient LLM for Dutch | Bram Vanroy et.al. | 2412.15450 | link |
2024-12-19 | PA-RAG: RAG Alignment via Multi-Perspective Preference Optimization | Jiayi Wu et.al. | 2412.14510 | link |
2024-12-19 | Are Longer Prompts Always Better? Prompt Selection in Large Language Models for Recommendation Systems | Genki Kusano et.al. | 2412.14454 | null |
2024-12-18 | A Survey on LLM Inference-Time Self-Improvement | Xiangjue Dong et.al. | 2412.14352 | link |
2024-12-18 | Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models | Seungeun Oh et.al. | 2412.12687 | null |
2024-12-17 | A System for Microserving of LLMs | Hongyi Jin et.al. | 2412.12488 | null |
2024-12-17 | LITA: An Efficient LLM-assisted Iterative Topic Augmentation Framework | Chia-Hsuan Chang et.al. | 2412.12459 | null |
2024-12-16 | CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation | Hongxuan Zhang et.al. | 2412.11741 | null |
2025-01-20 | FinLoRA: Finetuning Quantized Financial Large Language Models Using Low-Rank Adaptation | Dannong Wang et.al. | 2412.11378 | null |
2025-01-09 | Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning | Yun Qu et.al. | 2412.11120 | link |
2024-12-15 | NITRO: LLM Inference on Intel Laptop NPUs | Anthony Fei et.al. | 2412.11053 | link |
2025-03-11 | SCBench: A KV Cache-Centric Analysis of Long-Context Methods | Yucheng Li et.al. | 2412.10319 | null |
2024-12-17 | TurboAttention: Efficient Attention Approximation For High Throughputs LLMs | Hao Kang et.al. | 2412.08585 | null |
2024-12-11 | Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths | Naryeong Kim et.al. | 2412.08281 | null |
2024-12-12 | TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch | Xingchen Song et.al. | 2412.08237 | null |
2024-12-09 | Asynchronous LLM Function Calling | In Gim et.al. | 2412.07017 | null |
2024-12-08 | Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization | Dongwei Wang et.al. | 2412.06858 | null |
2024-12-09 | JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM | Takuro Fujii et.al. | 2412.06738 | link |
2024-12-09 | SparseAccelerate: Efficient Long-Context Inference for Mid-Range GPUs | James Vo et.al. | 2412.06198 | null |
2024-12-08 | XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference | Weizhuo Li et.al. | 2412.05896 | null |
2025-02-17 | APOLLO: SGD-like Memory, AdamW-level Performance | Hanqing Zhu et.al. | 2412.05270 | link |
2024-12-06 | Can Large Language Models Serve as Effective Classifiers for Hierarchical Multi-Label Classification of Scientific Documents at Industrial Scale? | Seyed Amin Tabatabaei et.al. | 2412.05137 | null |
2024-12-11 | Flash Communication: Reducing Tensor Parallelization Bottleneck for Fast Large Language Model Inference | Qingyuan Li et.al. | 2412.04964 | null |
2025-01-26 | GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments | Yanyu Chen et.al. | 2412.04788 | null |
2024-12-09 | Towards Performance-Aware Allocation for Accelerated Machine Learning on GPU-SSD Systems | Ayush Gundawar et.al. | 2412.04569 | link |
2024-12-03 | Multi-Bin Batching for Increasing LLM Inference Throughput | Ozgur Guldogan et.al. | 2412.04504 | null |
2025-01-17 | BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching | Zhen Zheng et.al. | 2412.03594 | null |
2024-12-04 | Unifying KV Cache Compression for Large Language Models with LeanKV | Yanqi Zhang et.al. | 2412.03131 | null |
2024-12-03 | Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity | Da Ma et.al. | 2412.02252 | null |
2024-12-02 | Data-Centric and Heterogeneity-Adaptive Sequence Parallelism for Efficient LLM Training | Yujie Wang et.al. | 2412.01523 | null |
2024-12-02 | PLD+: Accelerating LLM inference by leveraging Language Model Artifacts | Shwetha Somasundaram et.al. | 2412.01447 | null |
2024-12-02 | Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking | Marco Federici et.al. | 2412.01380 | null |
2024-12-02 | Can Large Language Models Serve as Evaluators for Code Summarization? | Yang Wu et.al. | 2412.01333 | link |
2024-12-05 | RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy | Geonho Lee et.al. | 2412.01129 | null |
2024-12-02 | TruncFormer: Private LLM Inference Using Only Truncations | Patrick Yubeaton et.al. | 2412.01042 | null |
2024-11-25 | Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration | Zhuofan Wen et.al. | 2412.00061 | null |
2024-11-29 | A dynamic parallel method for performance optimization on hybrid CPUs | Luo Yu et.al. | 2411.19542 | null |
2024-12-04 | Marconi: Prefix Caching for the Era of Hybrid LLMs | Rui Pan et.al. | 2411.19379 | null |
2024-12-08 | Puzzle: Distillation-Based NAS for Inference-Optimized LLMs | Akhiad Bercovich et.al. | 2411.19146 | null |
2024-11-27 | FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving | Ao Shen et.al. | 2411.18424 | null |
2024-11-29 | InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks | Xinyao Zheng et.al. | 2411.18191 | null |
2024-11-28 | MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache | Akshat Sharma et.al. | 2411.18077 | null |
2024-11-24 | Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments | Nikoleta Iliakopoulou et.al. | 2411.17741 | null |
2024-11-18 | Generative AI on the Edge: Architecture and Performance Evaluation | Zeinab Nezami et.al. | 2411.17712 | null |
2024-11-26 | Toward High-Performance LLM Serving: A Simulation-Based Approach for Identifying Optimal Parallelism | Yi-Chien Lin et.al. | 2411.17651 | null |
2024-11-26 | PIM-AI: A Novel Architecture for High-Efficiency LLM Inference | Cristobal Ortega et.al. | 2411.17309 | null |
2024-11-26 | Star Attention: Efficient LLM Inference over Long Sequences | Shantanu Acharya et.al. | 2411.17116 | link |
2024-11-26 | Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation | Chaoyi Jiang et.al. | 2411.17089 | null |
2024-11-25 | MixPE: Quantization and Hardware Co-design for Efficient LLM Inference | Yu Zhang et.al. | 2411.16158 | null |
2024-11-24 | eFedLLM: Efficient LLM Inference Based on Federated Learning | Shengwen Ding et.al. | 2411.16003 | null |
2024-11-24 | Ensuring Fair LLM Serving Amid Diverse Applications | Redwan Ibne Seraj Khan et.al. | 2411.15997 | null |
2024-11-24 | Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format | Chao Fang et.al. | 2411.15982 | null |
2024-11-24 | Task Scheduling for Efficient Inference of Large Language Models on Single Moderate GPU Systems | Wenxiang Lin et.al. | 2411.15715 | null |
2025-01-14 | AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution | Fengyuan Liu et.al. | 2411.15102 | link |
2024-11-27 | XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models | Yixin Dong et.al. | 2411.15100 | null |
2024-11-02 | Transforming Engineering Education Using Generative AI and Digital Twin Technologies | Yu-Zheng Lin et.al. | 2411.14433 | null |
2024-11-21 | InstCache: A Predictive Cache for LLM Serving | Longwei Zou et.al. | 2411.13820 | null |
2024-11-21 | Disentangling Memory and Reasoning Ability in Large Language Models | Mingyu Jin et.al. | 2411.13504 | link |
2024-11-27 | Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding | Hyun Ryu et.al. | 2411.13157 | null |
2024-11-21 | LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts | Zhuohan Gu et.al. | 2411.13009 | null |
2024-11-15 | An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2 | Pepijn de Reus et.al. | 2411.12758 | link |
2025-01-24 | SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference | Jiho Shin et.al. | 2411.12692 | null |
2024-11-18 | BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration | Yuzong Chen et.al. | 2411.11745 | link |
2024-11-18 | MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs | Shiyi Cao et.al. | 2411.11217 | null |
2024-11-17 | FastDraft: How to Train Your Draft | Ofir Zafrir et.al. | 2411.11055 | null |
2024-12-16 | SAM Decoding: Speculative Decoding via Suffix Automaton | Yuxuan Hu et.al. | 2411.10666 | link |
2024-11-15 | Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity | Zichen Song et.al. | 2411.10069 | null |
2024-11-15 | AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference | Janghwan Lee et.al. | 2411.09909 | null |
2024-11-23 | Squeezed Attention: Accelerating Long Context Length LLM Inference | Coleman Hooper et.al. | 2411.09688 | link |
2024-11-15 | Communication Compression for Tensor Parallel LLM Inference | Jan Hansen-Palmus et.al. | 2411.09510 | null |
2024-11-14 | Pie: Pooling CPU Memory for LLM Inference | Yi Xu et.al. | 2411.09317 | null |
2025-01-23 | Reducing Reasoning Costs: The Path of Optimization for Chain of Thought via Sparse Attention Mechanism | Libo Wang et.al. | 2411.09111 | link |
2024-11-12 | Towards Low-bit Communication for Tensor Parallel LLM Inference | Harry Dong et.al. | 2411.07942 | null |
2024-12-12 | ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization | Weibo Zhao et.al. | 2411.07762 | null |
2025-01-08 | BudgetMLAgent: A Cost-Effective LLM Multi-Agent system for Automating Machine Learning Tasks | Shubham Gandhi et.al. | 2411.07464 | null |
2024-11-19 | The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving | Kyoungmin Kim et.al. | 2411.07447 | null |
2024-11-10 | EcoServe: Maximizing Multi-Resource Utilization with SLO Guarantees in LLM Serving | Haiying Shen et.al. | 2411.06364 | null |
2024-11-08 | SSSD: Simply-Scalable Speculative Decoding | Michele Marzollo et.al. | 2411.05894 | null |
2024-11-08 | AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality | Ilias Bournias et.al. | 2411.05555 | null |
2024-11-07 | Hardware and Software Platform Inference | Cheng Zhang et.al. | 2411.05197 | null |
2024-10-22 | Scattered Forest Search: Smarter Code Space Exploration with LLMs | Jonathan Light et.al. | 2411.05010 | null |
2024-11-07 | SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference | Gabriele Oliaro et.al. | 2411.04975 | null |
2024-11-05 | CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration | Hongpeng Jin et.al. | 2411.02829 | null |
2024-12-19 | DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving | Yuhan Liu et.al. | 2411.02820 | null |
2024-11-10 | Context Parallelism for Scalable Million-Token Inference | Amy Yang et.al. | 2411.01783 | null |
2024-11-04 | RAGViz: Diagnose and Visualize Retrieval-Augmented Generation | Tevin Wang et.al. | 2411.01751 | link |
2024-11-03 | Autoformulation of Mathematical Optimization Models Using LLMs | Nicolás Astorga et.al. | 2411.01679 | null |
2024-11-06 | HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference | Peng Tang et.al. | 2411.01433 | null |
2024-11-02 | RA-WEBs: Remote Attestation for WEB services | Kosei Akama et.al. | 2411.01340 | null |
2024-11-02 | NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference | Xuanlin Jiang et.al. | 2411.01142 | null |
2024-10-30 | A Theoretical Perspective for Speculative Decoding Algorithm | Ming Yin et.al. | 2411.00841 | null |
2024-11-01 | Phase Diagram of Vision Large Language Models Inference: A Perspective from Interaction across Image and Instruction | Houjing Wei et.al. | 2411.00646 | null |
2024-11-01 | LLM-Based Misconfiguration Detection for AWS Serverless Computing | Jinfeng Wen et.al. | 2411.00642 | null |
2024-12-08 | ReverseNER: A Self-Generated Example-Driven Framework for Zero-Shot Named Entity Recognition with Large Language Models | Anbang Wang et.al. | 2411.00533 | null |
2024-11-01 | Attention Tracker: Detecting Prompt Injection Attacks in LLMs | Kuo-Han Hung et.al. | 2411.00348 | null |
2024-10-31 | LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators | Krishna Teja Chitty-Venkata et.al. | 2411.00136 | link |
2024-10-31 | Interpretable Language Modeling via Induction-head Ngram Models | Eunji Kim et.al. | 2411.00066 | link |
2024-10-31 | ALISE: Accelerating Large Language Model Serving with Speculative Scheduling | Youpeng Zhao et.al. | 2410.23537 | null |
2024-10-30 | BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference | Junqi Zhao et.al. | 2410.23079 | link |
2024-10-29 | Scaling LLM Inference with Optimized Sample Compute Allocation | Kexun Zhang et.al. | 2410.22480 | link |
2024-10-29 | SVIP: Towards Verifiable Inference of Open-source Large Language Models | Yifan Sun et.al. | 2410.22307 | null |
2025-02-08 | ProMoE: Fast MoE-based LLM Serving using Proactive Caching | Xiaoniu Song et.al. | 2410.22134 | null |
2025-01-21 | MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression | Noel Elias et.al. | 2410.21548 | link |
2024-10-28 | ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference | Hanshi Sun et.al. | 2410.21465 | link |
2024-10-27 | FIRP: Faster LLM inference via future intermediate representation prediction | Pengfei Wu et.al. | 2410.20488 | null |
2024-10-29 | Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management | Tuowei Wang et.al. | 2410.19274 | null |
2024-10-24 | Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design | Ruisi Cai et.al. | 2410.19123 | link |
2024-10-30 | Dynamic Vocabulary Pruning in Early-Exit LLMs | Jort Vincenti et.al. | 2410.18952 | link |
2024-10-25 | A Survey on Speech Large Language Models | Jing Peng et.al. | 2410.18908 | null |
2024-10-24 | A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs | Ankit Singh Rawat et.al. | 2410.18779 | null |
2024-10-24 | BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching | Peizhuang Cong et.al. | 2410.18701 | null |
2024-10-23 | CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation | Qinsi Wang et.al. | 2410.18311 | null |
2024-10-25 | Fast Inference for Augmented Large Language Models | Rana Shahout et.al. | 2410.18248 | null |
2024-10-23 | POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference | Aditya K Kamath et.al. | 2410.18038 | null |
2024-12-29 | AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning | Yehonathan Refael et.al. | 2410.17881 | null |
2024-10-22 | FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs | Haoran Lin et.al. | 2410.16663 | null |
2024-10-22 | Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency | Prafulla Kumar Choubey et.al. | 2410.16597 | null |
2024-12-18 | MagicPIG: LSH Sampling for Efficient LLM Generation | Zhuoming Chen et.al. | 2410.16179 | link |
2024-10-21 | Natural GaLore: Accelerating GaLore for memory-efficient LLM Training and Fine-tuning | Arijit Das et.al. | 2410.16029 | link |
2024-10-21 | RAC: Efficient LLM Factuality Correction with Retrieval Augmentation | Changmao Li et.al. | 2410.15667 | link |
2024-10-21 | Bayesian Concept Bottleneck Models with LLM Priors | Jean Feng et.al. | 2410.15555 | link |
2024-10-20 | CompAct: Compressed Activations for Memory-Efficient LLM Training | Yara Shamshoum et.al. | 2410.15352 | null |
2024-10-20 | EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models | Junhao Hu et.al. | 2410.15332 | null |
2024-10-19 | IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System | Minseok Seo et.al. | 2410.15008 | null |
2024-10-23 | Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching | Jie Peng et.al. | 2410.14740 | null |
2024-10-18 | A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference | You Wu et.al. | 2410.14442 | link |
2024-10-18 | Revisiting SLO and Goodput Metrics in LLM Serving | Zhibin Wang et.al. | 2410.14257 | null |
2024-10-18 | Leveraging Large Language Models for Enhancing Public Transit Services | Jiahao Wang et.al. | 2410.14147 | null |
2024-10-17 | RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs | Jiatan Huang et.al. | 2410.13987 | null |
2024-11-07 | Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs | Tianyu Guo et.al. | 2410.13835 | link |
2024-10-17 | Progressive Mixed-Precision Decoding for Efficient LLM Inference | Hao Mark Chen et.al. | 2410.13461 | null |
2024-10-17 | Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning | Minseok Choi et.al. | 2410.13274 | null |
2024-10-17 | Data Defenses Against Large Language Models | William Agnew et.al. | 2410.13138 | link |
2024-10-19 | In-context KV-Cache Eviction for LLMs via Attention-Gate | Zihao Zeng et.al. | 2410.12876 | null |
2024-10-10 | RecurFormer: Not All Transformer Heads Need Self-Attention | Ruiqing Yan et.al. | 2410.12850 | null |
2024-10-16 | COMET: Towards Partical W4A4KV4 LLMs Serving | Lian Liu et.al. | 2410.12168 | null |
2024-10-16 | Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning | Huiwen Wu et.al. | 2410.12130 | null |
2024-10-15 | Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix | Yingyu Liang et.al. | 2410.11261 | null |
2024-10-06 | Continuous Approximations for Improving Quantization Aware Training of LLMs | He Li et.al. | 2410.10849 | null |
2024-10-14 | DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads | Guangxuan Xiao et.al. | 2410.10819 | link |
2024-10-16 | SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization | Akrit Mudvari et.al. | 2410.10759 | null |
2024-10-12 | Power-Softmax: Towards Secure LLM Inference over Encrypted Data | Itamar Zimerman et.al. | 2410.09457 | null |
2024-10-11 | Large Language Models for Energy-Efficient Code: Emerging Results and Future Directions | Huiyun Peng et.al. | 2410.09241 | null |
2024-10-11 | SubZero: Random Subspace Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning | Ziming Yu et.al. | 2410.08989 | link |
2024-12-03 | HyperDPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework | Yinuo Ren et.al. | 2410.08316 | null |
2024-10-14 | Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining | Tianyi Bai et.al. | 2410.08102 | link |
2024-10-09 | SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration | Heming Xia et.al. | 2410.06916 | link |
2024-10-08 | Active Evaluation Acquisition for Efficient LLM Benchmarking | Yang Li et.al. | 2410.05952 | null |
2024-10-08 | Exploring the Meaningfulness of Nearest Neighbor Search in High-Dimensional Space | Zhonghan Chen et.al. | 2410.05752 | null |
2024-10-08 | ParallelSpec: Parallel Drafter for Efficient Speculative Decoding | Zilin Xiao et.al. | 2410.05589 | null |
2024-10-07 | Fast State Restoration in LLM Serving with HCache | Shiwei Gao et.al. | 2410.05004 | null |
2024-10-06 | RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference | Yige Xu et.al. | 2410.04519 | link |
2025-01-23 | Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective | Jinhao Li et.al. | 2410.04466 | null |
2024-12-05 | SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation | Aurick Qiao et.al. | 2410.03960 | null |
2024-10-04 | LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity | Selim Furkan Tekin et.al. | 2410.03953 | link |
2024-10-04 | EXAQ: Exponent Aware Quantization For LLMs Acceleration | Moran Shkolnik et.al. | 2410.03185 | link |
2024-10-04 | UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inference | Jing Xiong et.al. | 2410.03090 | null |
2024-10-03 | LLMCO2: Advancing Accurate Carbon Footprint Prediction for LLM Inferences | Zhenxiao Fu et.al. | 2410.02950 | null |
2024-10-03 | Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration | Yun Qu et.al. | 2410.02511 | link |
2024-10-03 | LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services | Małgorzata Łazuka et.al. | 2410.02425 | link |
2024-10-04 | Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation | Xiaoqun Liu et.al. | 2410.02220 | null |
2024-10-05 | Aligning with Logic: Measuring, Evaluating and Improving Logical Consistency in Large Language Models | Yinhong Liu et.al. | 2410.02205 | null |
2024-10-02 | Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads | Yuxiang Huang et.al. | 2410.01805 | link |
2024-10-02 | ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving | Yifan Qiao et.al. | 2410.01228 | null |
2024-10-01 | TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices | Zonghang Li et.al. | 2410.00531 | link |
2024-10-09 | LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management | Yi Xiong et.al. | 2410.00428 | null |
2024-11-06 | The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems | Linke Song et.al. | 2409.20002 | null |
2024-09-28 | SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models | Yi Wu et.al. | 2409.19471 | null |
2024-11-28 | Confidential Prompting: Protecting User Prompts from Cloud LLM Providers | In Gim et.al. | 2409.19134 | link |
2024-09-26 | Control Industrial Automation System with Large Language Models | Yuchen Xia et.al. | 2409.18009 | link |
2024-10-18 | Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores | Shaobo Ma et.al. | 2409.17870 | null |
2024-09-25 | Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction | Zhenmei Shi et.al. | 2409.17422 | link |
2024-09-25 | Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations | Amey Agrawal et.al. | 2409.17264 | null |
2024-09-25 | Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale | Fan Zhou et.al. | 2409.17115 | link |
2024-09-25 | Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference | Zongyue Qin et.al. | 2409.16560 | null |
2024-10-21 | AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization | Yifan Tan et.al. | 2409.16546 | link |
2024-11-07 | Enabling Efficient On-Device Fine-Tuning of LLMs Using Only Inference Engines | Lei Gao et.al. | 2409.15520 | link |
2024-10-29 | Eagle: Efficient Training-Free Router for Multi-LLM Inference | Zesen Zhao et.al. | 2409.15518 | null |
2024-10-03 | Archon: An Architecture Search Framework for Inference-Time Techniques | Jon Saad-Falcon et.al. | 2409.15254 | link |
2024-09-23 | CSPS: A Communication-Efficient Sequence-Parallelism based Serving System for Transformer based Models with Long Prompts | Zeyu Zhang et.al. | 2409.15104 | null |
2024-09-24 | UELLM: A Unified and Efficient Approach for LLM Inference Serving | Yiyuan He et.al. | 2409.14961 | null |
2024-11-01 | RACOON: An LLM-based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph | Lindsey Linxi Wei et.al. | 2409.14556 | null |
2024-09-21 | Practically implementing an LLM-supported collaborative vulnerability remediation process: a team-based approach | Xiaoqing Wang et.al. | 2409.14058 | null |
2024-10-21 | Do Large Language Models Need a Content Delivery Network? | Yihua Cheng et.al. | 2409.13761 | link |
2024-09-19 | PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs) | Mahmoud Nazzal et.al. | 2409.12699 | link |
2024-09-12 | LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs | Han Xu et.al. | 2409.11424 | null |
2024-09-04 | ISO: Overlap of Computation and Communication within Seqenence For LLM Inference | Bin Xiao et.al. | 2409.11155 | null |
2024-12-31 | RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval | Di Liu et.al. | 2409.10516 | link |
2024-09-12 | Enabling Cost-Effective UI Automation Testing with Retrieval-Based LLMs: A Case Study in WeChat | Sidong Feng et.al. | 2409.07829 | null |
2024-09-13 | LLM-Enhanced Software Patch Localization | Jinhong Yu et.al. | 2409.06816 | null |
2024-09-24 | OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language Models | Jahyun Koo et.al. | 2409.05902 | null |
2024-09-08 | InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference | Xiurui Pan et.al. | 2409.04992 | null |
2024-09-07 | Achieving Peak Performance for Large Language Models: A Systematic Review | Zhyar Rzgar K Rostam et.al. | 2409.04833 | null |
2024-09-06 | Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance | Guanyu Lin et.al. | 2409.04593 | null |
2024-09-06 | A First Look At Efficient And Secure On-Device LLM Inference Against KV Leakage | Huan Yang et.al. | 2409.04040 | null |
2024-11-05 | Confidential Computing on NVIDIA Hopper GPUs: A Performance Benchmark Study | Jianwei Zhu et.al. | 2409.03992 | null |
2024-09-05 | Sirius: Contextual Sparsity with Correction for Efficient LLMs | Yang Zhou et.al. | 2409.03856 | link |
2024-08-31 | HSF: Defending against Jailbreak Attacks with Hidden State Filtering | Cheng Qian et.al. | 2409.03788 | null |
2024-12-11 | Efficient Large Foundation Model Inference: A Perspective From Model and System Co-Design | Dong Liu et.al. | 2409.01990 | null |
2024-09-03 | Efficient LLM Context Distillation | Rajesh Upadhayayaya et.al. | 2409.01930 | null |
2024-09-03 | Laser: Parameter-Efficient LLM Bi-Tuning for Sequential Recommendation with Collaborative Information | Xinyu Zhang et.al. | 2409.01605 | null |
2024-09-02 | CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification | Junhui He et.al. | 2409.01366 | null |
2024-12-18 | Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference | Barys Liskavets et.al. | 2409.01227 | null |
2024-09-01 | Research on LLM Acceleration Using the High-Performance RISC-V Processor “Xiangshan” (Nanhu Version) Based on the Open-Source Matrix Instruction Set Extension (Vector Dot Product) | Xu-Hao Chen et.al. | 2409.00661 | null |
2024-11-10 | Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling | Guangya Wan et.al. | 2408.17017 | null |
2024-08-28 | Decentralized LLM Inference over Edge Networks with Energy Harvesting | Aria Khoshsirat et.al. | 2408.15907 | null |
2024-08-28 | Efficient LLM Scheduling by Learning to Rank | Yichao Fu et.al. | 2408.15792 | link |
2024-08-28 | Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation | Lujun Gui et.al. | 2408.15562 | null |
2024-08-23 | Memory-Efficient LLM Training with Online Subspace Descent | Kaizhao Liang et.al. | 2408.12857 | link |
2024-08-22 | NanoFlow: Towards Optimal Large Language Model Serving Throughput | Kan Zhu et.al. | 2408.12757 | link |
2024-10-23 | TensorOpera Router: A Multi-Model Router for Efficient LLM Inference | Dimitris Stripelis et.al. | 2408.12320 | null |
2024-09-04 | Parallel Speculative Decoding with Adaptive Draft Length | Tianyu Liu et.al. | 2408.11850 | link |
2024-08-21 | MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models | Elias Frantar et.al. | 2408.11743 | link |
2024-08-23 | Xinyu: An Efficient LLM-based System for Commentary Generation | Yiquan Wu et.al. | 2408.11609 | null |
2024-08-21 | Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning | Kai Xiong et.al. | 2408.11431 | null |
2024-08-21 | Image Score: Learning and Evaluating Human Preferences for Mercari Search | Chingis Oinar et.al. | 2408.11349 | null |
2024-08-20 | Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models | Artem Vazhentsev et.al. | 2408.10692 | null |
2024-08-20 | How Well Do Large Language Models Serve as End-to-End Secure Code Producers? | Jianian Gong et.al. | 2408.10495 | null |
2024-09-29 | GoNoGo: An Efficient LLM-based Multi-Agent System for Streamlining Automotive Software Release Decision-Making | Arsham Gholamzadeh Khoee et.al. | 2408.09785 | null |
2024-08-19 | PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars | Sumanth Prabhu et.al. | 2408.08869 | null |
2024-08-23 | ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models | Chao Zeng et.al. | 2408.08554 | link |
2024-08-14 | LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference | Seungjae Moon et.al. | 2408.07326 | null |
2024-08-12 | LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration | Zhiwen Mo et.al. | 2408.06003 | null |
2024-08-16 | Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion | Jacob K Christopher et.al. | 2408.05636 | null |
2024-08-10 | LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale | Jaehong Cho et.al. | 2408.05499 | link |
2024-08-05 | SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving | Andreas Kosmas Kakolyris et.al. | 2408.05235 | null |
2024-09-14 | Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness | Xiaojing Fan et.al. | 2408.04585 | null |
2024-08-08 | Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning | Ke Cheng et.al. | 2408.04323 | null |
2024-08-07 | Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference | Zeyu Zhang et.al. | 2408.04107 | null |
2024-08-07 | MPC-Minimized Secure LLM Inference | Deevashwer Rathee et.al. | 2408.03561 | null |
2024-08-06 | Can LLMs Serve As Time Series Anomaly Detectors? | Manqing Dong et.al. | 2408.03475 | null |
2024-08-05 | Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning | Hao Zhou et.al. | 2408.02549 | null |
2024-08-02 | The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines | Matias Martinez et.al. | 2408.01050 | null |
2024-08-01 | DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency | Jovan Stojkovic et.al. | 2408.00741 | null |
2024-08-01 | Designing Efficient LLM Accelerators for Edge Devices | Jude Haris et.al. | 2408.00462 | null |
2024-08-01 | Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control | Hao Zhou et.al. | 2408.00214 | null |
2024-09-10 | ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency | Yuhang Yao et.al. | 2408.00008 | null |
2024-08-01 | Responsive ML inference in multi-tenanted environments using AQUA | Abhishek Vijaya Kumar et.al. | 2407.21255 | null |
2024-11-04 | Palu: Compressing KV-Cache with Low-Rank Projection | Chi-Chih Chang et.al. | 2407.21118 | link |
2024-07-30 | Accelerating Large Language Model Inference with Self-Supervised Early Exits | Florian Valade et.al. | 2407.21082 | null |
2024-10-03 | ThinK: Thinner Key Cache by Query-Driven Pruning | Yuhui Xu et.al. | 2407.21018 | null |
2024-07-25 | An Efficient Inference Framework for Early-exit Large Language Models | Ruijie Miao et.al. | 2407.20272 | null |
2024-07-29 | Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost | Sania Nayab et.al. | 2407.19825 | null |
2024-07-29 | Teaching LLMs at Charles University: Assignments and Activities | Jindřich Helcl et.al. | 2407.19798 | null |
2024-07-09 | Mobile Edge Intelligence for Large Language Models: A Contemporary Survey | Guanqiao Qu et.al. | 2407.18921 | null |
2024-07-04 | The Price of Prompting: Profiling Energy Use in Large Language Models Inference | Erik Johannes Husom et.al. | 2407.16893 | link |
2024-07-23 | PhenoFlow: A Human-LLM Driven Visual Analytics System for Exploring Large and Complex Stroke Datasets | Jaeyoung Kim et.al. | 2407.16329 | null |
2024-07-22 | RazorAttention: Efficient KV Cache Compression Through Retrieval Heads | Hanlin Tang et.al. | 2407.15891 | null |
2024-07-22 | vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving | Jiale Xu et.al. | 2407.15309 | link |
2024-07-20 | All Against Some: Efficient Integration of Large Language Models for Message Passing in Graph Neural Networks | Ajay Jaiswal et.al. | 2407.14996 | null |
2024-07-19 | LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference | Qichen Fu et.al. | 2407.14057 | null |
2024-07-13 | Beyond KV Caching: Shared Attention for Efficient LLMs | Bingli Liao et.al. | 2407.12866 | link |
2024-07-01 | PQCache: Product Quantization-based KVCache for Long Context LLM Inference | Hailin Zhang et.al. | 2407.12820 | null |
2024-07-17 | Struct-X: Enhancing Large Language Models Reasoning with Structured Data | Xiaoyu Tan et.al. | 2407.12522 | null |
2024-07-17 | LLM Inference Serving: Survey of Recent Advances and Opportunities | Baolin Li et.al. | 2407.12391 | null |
2024-10-11 | Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale | Ayush Kaushal et.al. | 2407.12327 | link |
2024-11-16 | PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation | Branden Butler et.al. | 2407.11798 | null |
2024-08-16 | Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference | Yuan Feng et.al. | 2407.11550 | link |
2024-07-15 | Static Detection of Filesystem Vulnerabilities in Android Systems | Yu-Tsung Lee et.al. | 2407.11279 | null |
2024-10-03 | Fast Matrix Multiplications for Lookup Table-Quantized LLMs | Han Guo et.al. | 2407.10960 | link |
2024-10-02 | Optimized Multi-Token Joint Decoding with Auxiliary Model for LLM Inference | Zongyue Qin et.al. | 2407.09722 | null |
2024-08-30 | Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems | Amey Agrawal et.al. | 2407.07000 | link |
2024-07-08 | Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU | Daliang Xu et.al. | 2407.05858 | link |
2024-07-07 | A Queueing Theoretic Perspective on Low-Latency LLM Inference with Variable Token Length | Yuqing Yang et.al. | 2407.05347 | null |
2024-07-06 | Code Less, Align More: Efficient LLM Fine-tuning for Code Generation with Data Pruning | Yun-Da Tsai et.al. | 2407.05040 | null |
2024-11-16 | Software-Hardware Co-Design For Embodied AI Robots | Yiyang Huang et.al. | 2407.04292 | link |
2024-07-04 | Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems | Grant Wilkins et.al. | 2407.04014 | null |
2024-10-30 | MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention | Huiqiang Jiang et.al. | 2407.02490 | link |
2024-06-29 | When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration | Philipp Allgeuer et.al. | 2407.00518 | link |
2024-06-29 | Teola: Towards End-to-End Optimization of LLM-based Applications | Xin Tan et.al. | 2407.00326 | null |
2024-06-25 | T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge | Jianyu Wei et.al. | 2407.00088 | link |
2024-07-09 | Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving | Ruoyu Qin et.al. | 2407.00079 | link |
2024-06-28 | InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management | Wonbeom Lee et.al. | 2406.19707 | null |
2024-08-28 | AI-native Memory: A Pathway from LLMs Towards AGI | Jingbo Shang et.al. | 2406.18312 | null |
2024-06-25 | FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model | Feijie Wu et.al. | 2406.17706 | link |
2024-06-26 | MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool | Cunchen Hu et.al. | 2406.17565 | null |
2024-11-11 | Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters | Euiin Yi et.al. | 2406.16758 | link |
LLM Scheduling
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-05-29 | Sustainable Carbon-Aware and Water-Efficient LLM Scheduling in Geo-Distributed Cloud Datacenters | Hayden Moore et.al. | 2505.23554 | null |
2025-05-14 | ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor | Seungbeom Choi et.al. | 2505.09142 | null |
2025-06-08 | PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference | Zeyu Zhang et.al. | 2409.15104 | null |
2024-08-28 | Efficient LLM Scheduling by Learning to Rank | Yichao Fu et.al. | 2408.15792 | link |
MoE
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-06-26 | Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts | Jiajie Yang et.al. | 2506.21328 | null |
2025-06-26 | Learning to Skip the Middle Layers of Transformers | Tim Lawson et.al. | 2506.21103 | null |
2025-06-26 | Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning | Haodong Lu et.al. | 2506.21035 | null |
2025-06-26 | EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning | Xiao Zhang et.al. | 2506.20986 | null |
2025-06-25 | The Singapore Consensus on Global AI Safety Research Priorities | Yoshua Bengio et.al. | 2506.20702 | null |
2025-06-17 | Utility-Driven Speculative Decoding for Mixture-of-Experts | Anish Saxena et.al. | 2506.20675 | null |
2025-06-25 | Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration | Jiaxing Huang et.al. | 2506.20282 | null |
2025-06-24 | Integrating Pair Programming as a Work Practice | Nina Haugland Andersen et.al. | 2506.19511 | null |
2025-06-24 | The H $α$ line as a probe of chromospheric magnetic fields | Harsh Mathur et.al. | 2506.19510 | null |
2025-06-23 | Multimodal Anomaly Detection with a Mixture-of-Experts | Christoph Willibald et.al. | 2506.19077 | null |
2025-06-23 | Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models | Zihan Wang et.al. | 2506.18945 | null |
2025-06-23 | Shift Happens: Mixture of Experts based Continual Adaptation in Federated Learning | Rahul Atul Bhope et.al. | 2506.18789 | null |
2025-06-23 | An Audio-centric Multi-task Learning Framework for Streaming Ads Targeting on Spotify | Shivam Verma et.al. | 2506.18735 | null |
2025-06-23 | Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks | Xiaodong Wu et.al. | 2506.18543 | null |
2025-06-23 | SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation | Zichong Li et.al. | 2506.18349 | null |
2025-06-23 | Sharpening the Spear: Adaptive Expert-Guided Adversarial Attack Against DRL-based Autonomous Driving Policies | Junchao Fan et.al. | 2506.18304 | null |
2025-06-22 | Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection | Zheng Zhan et.al. | 2506.18145 | null |
2025-06-21 | Incorporating Rather Than Eliminating: Achieving Fairness for Skin Disease Diagnosis Through Group-Specific Expert | Gelei Xu et.al. | 2506.17787 | null |
2025-06-21 | Physics-informed mixture of experts network for interpretable battery degradation trajectory computation amid second-life complexities | Xinghao Huang et.al. | 2506.17755 | null |
2025-06-21 | PDC-Net: Pattern Divide-and-Conquer Network for Pelvic Radiation Injury Segmentation | Xinyu Xiong et.al. | 2506.17712 | null |
2025-06-20 | SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification | Zhenglin Lai et.al. | 2506.17368 | null |
2025-06-19 | FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE | Khiem Le et.al. | 2506.16600 | null |
2025-06-19 | Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models | Daniel Fidel Harvey et.al. | 2506.16419 | null |
2025-06-19 | DCFNet: Doppler Correction Filter Network for Integrated Sensing and Communication in Multi-User MIMO-OFDM Systems | Hyeonho Noh et.al. | 2506.16191 | null |
2025-06-17 | Scaling Intelligence: Designing Data Centers for Next-Gen Language Models | Jesmin Jahan Tithi et.al. | 2506.15006 | null |
2025-06-17 | NeuroMoE: A Transformer-Based Mixture-of-Experts Framework for Multi-Modal Neurological Disorder Classification | Wajih Hassan Raza et.al. | 2506.14970 | null |
2025-06-17 | Narrowing the Gap between TEEs Threat Model and Deployment Strategies | Filip Rezabek et.al. | 2506.14964 | null |
2025-05-31 | Assembly of Experts: Linear-time construction of the Chimera LLM variants with emergent and adaptable behaviors | Henrik Klagges et.al. | 2506.14794 | null |
2025-06-19 | Integrating Dynamical Systems Learning with Foundational Models: A Meta-Evolutionary AI Framework for Clinical Trials | Joseph Geraci et.al. | 2506.14782 | null |
2025-06-17 | GMT: General Motion Tracking for Humanoid Whole-Body Control | Zixuan Chen et.al. | 2506.14770 | null |
2025-06-17 | Exploring Speaker Diarization with Mixture of Experts | Gaobin Yang et.al. | 2506.14750 | null |
2025-06-18 | Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs | Ling Team et.al. | 2506.14731 | null |
2025-06-17 | GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors | Hengyuan Zhang et.al. | 2506.14646 | link |
2025-06-17 | Single-Example Learning in a Mixture of GPDMs with Latent Geometries | Jesse St. Amand et.al. | 2506.14563 | null |
2025-06-21 | MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation | Shen Yuan et.al. | 2506.14436 | link |
2025-06-17 | MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models | Hongyu Wang et.al. | 2506.14435 | null |
2025-06-17 | Less is More: Undertraining Experts Improves Model Upcycling | Stefan Horoi et.al. | 2506.14126 | null |
2025-06-16 | Load Balancing Mixture of Experts with Similarity Preserving Routers | Nabil Omi et.al. | 2506.14038 | null |
2025-06-16 | GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics | Qianzhong Chen et.al. | 2506.14009 | null |
2025-06-16 | MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention | MiniMax et.al. | 2506.13585 | link |
2025-06-16 | Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization | Guanghui Song et.al. | 2506.13541 | null |
2025-06-16 | EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization | Zhongqian Fu et.al. | 2506.13329 | link |
2025-06-16 | Breaking Thought Patterns: A Multi-Dimensional Reasoning Framework for LLMs | Xintong Tang et.al. | 2506.13192 | null |
2025-06-19 | Serving Large Language Models on Huawei CloudMatrix384 | Pengfei Zuo et.al. | 2506.12708 | null |
2025-06-14 | Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts | Shengzhuang Chen et.al. | 2506.12597 | null |
2025-06-14 | Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control | Rongpeng Li et.al. | 2506.12453 | null |
2025-06-17 | HarMoEny: Efficient Multi-GPU Inference of MoE Models | Zachary Doucet et.al. | 2506.12417 | null |
2025-06-14 | Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model | Chong Li et.al. | 2506.12388 | null |
2025-06-13 | Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources? | Houyi Li et.al. | 2506.12119 | null |
2025-06-13 | Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution | Zhangkai Ni et.al. | 2506.11823 | link |
2025-05-21 | MoTE: Mixture of Task-specific Experts for Pre-Trained ModelBased Class-incremental Learning | Linjie Li et.al. | 2506.11038 | null |
2025-04-23 | Test code generation at Ericsson using Program Analysis Augmented Fine Tuned LLMs | Sai Krishna et.al. | 2506.11006 | null |
2025-06-12 | Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts | Zaijing Li et.al. | 2506.10357 | null |
2025-06-12 | Technical Report with Proofs for A Full Picture in Conformance Checking: Efficiently Summarizing All Optimal Alignments | Philipp Bär et.al. | 2506.10345 | null |
2025-06-13 | A Survey of Generative Categories and Techniques in Multimodal Large Language Models | Longzhen Han et.al. | 2506.10016 | null |
2025-06-11 | GigaChat Family: Efficient Russian Language Modeling Through Mixture of Experts Architecture | GigaChat team et.al. | 2506.09440 | null |
2025-06-11 | DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts | Yuchen Feng et.al. | 2506.09351 | null |
2025-06-11 | Ming-Omni: A Unified Multimodal Model for Perception and Generation | Inclusion AI et.al. | 2506.09344 | link |
2025-06-10 | CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks | Yixuan Li et.al. | 2506.08931 | null |
2025-06-10 | CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA | Jiale Dong et.al. | 2506.08496 | link |
2025-06-11 | MedMoE: Modality-Specialized Mixture of Experts for Medical Vision-Language Understanding | Shivang Chopra et.al. | 2506.08356 | null |
2025-06-09 | Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting | Timothée Hornek Amir Sartipi et.al. | 2506.08113 | null |
2025-06-11 | STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation | Yiming Wang et.al. | 2506.08054 | link |
2025-06-09 | A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling | Jacob Helwig et.al. | 2506.07969 | link |
2025-06-09 | New Insights into the T Tauri Binary Separation Distribution | Caleb Eastlund et.al. | 2506.07938 | null |
2025-06-09 | M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration | Yongzhen Wang et.al. | 2506.07814 | null |
2025-06-11 | MIRA: Medical Time Series Foundation Model for Real-World Health Data | Hao Li et.al. | 2506.07584 | null |
2025-06-11 | MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization | Ken Yaggel et.al. | 2506.07563 | link |
2025-06-09 | MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts | Wei Tao et.al. | 2506.07533 | null |
2025-06-09 | Graph-of-Causal Evolution: Challenging Chain-of-Model for Reasoning | Libo Wang et.al. | 2506.07501 | null |
2025-06-09 | MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing | Haiyue Ma et.al. | 2506.07366 | null |
2025-06-08 | UNO: Unified Self-Supervised Monocular Odometry for Platform-Agnostic Deployment | Wentao Zhao et.al. | 2506.07013 | null |
2025-06-07 | High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations | Ziwei Li et.al. | 2506.06858 | null |
2025-06-07 | Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning | Yuan Yuan et.al. | 2506.06694 | null |
2025-06-25 | SMAR: Soft Modality-Aware Routing Strategy for MoE-based Multimodal Large Language Models Preserving Language Capabilities | Guoyang Xia et.al. | 2506.06406 | null |
2025-05-27 | MoE-Gyro: Self-Supervised Over-Range Reconstruction and Denoising for MEMS Gyroscopes | Feiyang Pan et.al. | 2506.06318 | null |
2025-06-06 | Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization | Jonathan Yang et.al. | 2506.06196 | null |
2025-06-06 | MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models | Jie Cao et.al. | 2506.05928 | null |
2025-06-06 | dots.llm1 Technical Report | Bi Huo et.al. | 2506.05767 | null |
2025-06-05 | Mixture-of-Experts Meets In-Context Reinforcement Learning | Wenhao Wu et.al. | 2506.05426 | null |
2025-06-20 | Kinetics: Rethinking Test-Time Scaling Laws | Ranajoy Sadhukhan et.al. | 2506.05333 | link |
2025-06-05 | Lifelong Evolution: Collaborative Learning between Large and Small Language Models for Continuous Emergent Fake News Detection | Ziyi Zhou et.al. | 2506.04739 | null |
2025-06-09 | FlashDMoE: Fast Distributed MoE in a Single Kernel | Osayamen Jonathan Aimuyo et.al. | 2506.04667 | link |
2025-06-04 | Out-of-Distribution Graph Models Merging | Yidi Wang et.al. | 2506.03674 | null |
2025-06-04 | Resolving Task Objective Conflicts in Unified Multimodal Understanding and Generation via Task-Aware Mixture-of-Experts | Jiaxing Zhang et.al. | 2506.03591 | null |
2025-06-04 | PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs | Ze Yu Zhang et.al. | 2506.02965 | null |
2025-06-03 | Scaling Fine-Grained MoE Beyond 50B Parameters: Empirical Evaluation and Practical Insights | Jakub Krajewski et.al. | 2506.02890 | null |
2025-06-03 | Brain-Like Processing Pathways Form in Models With Heterogeneous Experts | Jack Cook et.al. | 2506.02813 | null |
2025-06-04 | MemoryOut: Learning Principal Features via Multimodal Sparse Filtering Network for Semi-supervised Video Anomaly Detection | Juntong Li et.al. | 2506.02535 | null |
2025-06-03 | MidPO: Dual Preference Optimization for Safety and Helpfulness in Large Language Models via a Mixture of Experts Framework | Yupeng Qi et.al. | 2506.02460 | null |
2025-05-31 | Enhancing Multimodal Continual Instruction Tuning with BranchLoRA | Duzhen Zhang et.al. | 2506.02041 | null |
2025-06-02 | SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model | Zhao Yang et.al. | 2506.01833 | link |
2025-06-02 | Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning | Ryotaro Kawata et.al. | 2506.01656 | null |
2025-06-02 | DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models | Jiancheng Ye et.al. | 2506.01257 | null |
2025-06-01 | Unlocking Personalized Knowledge in Federated Large Language Model: The Power of Mixture of Experts | Fan Liu et.al. | 2506.00965 | null |
2025-05-31 | FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts | Xinyi Wang et.al. | 2506.00495 | null |
2025-05-30 | Mixture-of-Experts for Personalized and Semantic-Aware Next Location Prediction | Shuai Liu et.al. | 2505.24597 | null |
2025-06-11 | Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis | Junzhuo Li et.al. | 2505.24593 | null |
2025-05-30 | Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer | Yilun Kong et.al. | 2505.24378 | link |
2025-05-30 | GradPower: Powering Gradients for Faster Language Model Pre-Training | Mingze Wang et.al. | 2505.24275 | null |
2025-05-30 | On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks | Mingze Wang et.al. | 2505.24205 | null |
2025-05-29 | Point-MoE: Towards Cross-Domain Generalization in 3D Semantic Segmentation via Mixture-of-Experts | Xuweiyi Chen et.al. | 2505.23926 | null |
2025-06-09 | Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert | Zhaokun Wang et.al. | 2505.23868 | null |
2025-05-29 | Revisiting Uncertainty Estimation and Calibration of Large Language Models | Linwei Tao et.al. | 2505.23854 | null |
2025-05-28 | EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models | Linglin Jing et.al. | 2505.23830 | null |
2025-06-03 | LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions | Hadi Askari et.al. | 2505.23811 | null |
2025-05-29 | From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents | Tobias Lindenbauer et.al. | 2505.23422 | link |
2025-05-29 | Context-Aware Semantic Communication for the Wireless Networks | Guangyuan Liu et.al. | 2505.23249 | null |
2025-05-29 | Two Is Better Than One: Rotations Scale LoRAs | Hongcan Guo et.al. | 2505.23184 | null |
2025-05-28 | HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer | Qi Cai et.al. | 2505.22705 | link |
2025-05-28 | Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts | Xue Zhang et.al. | 2505.22582 | null |
2025-05-28 | A Human-Centric Approach to Explainable AI for Personalized Education | Vinitra Swamy et.al. | 2505.22541 | link |
2025-05-28 | Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion | Kewen Chen et.al. | 2505.22360 | null |
2025-05-28 | Advancing Expert Specialization for Better MoE | Hongcan Guo et.al. | 2505.22323 | null |
2025-05-28 | ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation | Jiawen Yu et.al. | 2505.22159 | null |
2025-05-28 | On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition | Shujie HU et.al. | 2505.22072 | null |
2025-05-28 | AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation | Yan Rong et.al. | 2505.22053 | null |
2025-05-29 | ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge | Zhongyi Zhou et.al. | 2505.21906 | null |
2025-05-27 | MedBridge: Bridging Foundation Vision-Language Models to Medical Image Diagnosis | Yitong Li et.al. | 2505.21698 | null |
2025-05-23 | EvidenceMoE: A Physics-Guided Mixture-of-Experts with Evidential Critics for Advancing Fluorescence Light Detection and Ranging in Scattering Media | Ismail Erbas et.al. | 2505.21532 | null |
2025-05-28 | Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity | Yehui Tang et.al. | 2505.21411 | null |
2025-05-27 | Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM’s Instruction-Following Capabilities | Junyan Zhang et.al. | 2505.21191 | null |
2025-05-27 | Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts | Yue Zhang et.al. | 2505.21079 | null |
2025-05-27 | Multi-objective Large Language Model Alignment with Hierarchical Experts | Zhuo Li et.al. | 2505.20925 | null |
2025-05-26 | FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models | Hao Kang et.al. | 2505.20225 | link |
2025-06-01 | NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID | Shihao Li et.al. | 2505.20001 | null |
2025-05-26 | Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments | Junming Liu et.al. | 2505.19699 | null |
2025-06-13 | MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE | Zongle Huang et.al. | 2505.19645 | null |
2025-05-26 | Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate | Liangwei Nathan Zheng et.al. | 2505.19525 | link |
2025-05-26 | WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference | Sihan Chen et.al. | 2505.19427 | link |
2025-05-25 | RankLLM: A Python Package for Reranking with LLMs | Sahel Sharifymoghaddam et.al. | 2505.19284 | null |
2025-05-25 | I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts | Jiayi Xin et.al. | 2505.19190 | link |
2025-05-24 | TrajMoE: Spatially-Aware Mixture of Experts for Unified Human Mobility Modeling | Chonghua Han et.al. | 2505.18670 | null |
2025-05-24 | ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation | Jian Liang et.al. | 2505.18640 | link |
2025-05-24 | Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter | Weizhi Zhong et.al. | 2505.18612 | null |
2025-05-24 | Guiding the Experts: Semantic Priors for Efficient and Focused MoE Routing | Chengxi Min et.al. | 2505.18586 | link |
2025-05-24 | Focus on What Matters: Enhancing Medical Vision-Language Models with Automatic Attention Alignment Tuning | Aofei Chang et.al. | 2505.18503 | null |
2025-05-24 | On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts | Fanqi Yan et.al. | 2505.18455 | null |
2025-05-24 | $μ$ -MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts | Toshiaki Koike-Akino et.al. | 2505.18451 | null |
2025-05-23 | Betelgeuse’s Buddy: X-Ray Constraints on the Nature of $α$ Ori B | Anna J. G. O’Grady et.al. | 2505.18376 | null |
2025-05-23 | Betelgeuse, Betelgeuse, Betelgeuse, Betel-buddy? Constraints on the dynamical companion to $α$ Orionis from HST | Jared A. Goldberg et.al. | 2505.18375 | null |
2025-05-13 | Constrained Edge AI Deployment: Fine-Tuning vs Distillation for LLM Compression | Jacob Sander et.al. | 2505.18166 | null |
2025-05-23 | Enhancing CTR Prediction with De-correlated Expert Networks | Jiancheng Wang et.al. | 2505.17925 | null |
2025-05-23 | PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval | Zehua Pei et.al. | 2505.17639 | null |
2025-05-23 | CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning | Jinyuan Feng et.al. | 2505.17553 | null |
2025-05-31 | MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation | Kaixing Yang et.al. | 2505.17543 | null |
2025-06-02 | JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model | Qihao Duan et.al. | 2505.17257 | null |
2025-05-31 | TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling | Weizhe Lin et.al. | 2505.17155 | null |
2025-05-22 | DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving | Zhenjie Yang et.al. | 2505.16278 | null |
2025-05-22 | DualComp: End-to-End Learning of a Unified Dual-Modality Lossless Compressor | Yan Zhao et.al. | 2505.16256 | null |
2025-05-21 | Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models | Jingcong Liang et.al. | 2505.16056 | link |
2025-05-26 | MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding | Yuxiang Wei et.al. | 2505.15946 | null |
2025-05-21 | Who “Controls” Where Work Shall be Done? State-of-Practice in Post-Pandemic Remote Work Regulation | Darja Smite et.al. | 2505.15743 | null |
2025-05-21 | CoLA: Collaborative Low-Rank Adaptation | Yiyun Zhou et.al. | 2505.15471 | link |
2025-05-22 | Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought | Tencent Hunyuan Team et.al. | 2505.15431 | null |
2025-05-21 | Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks | Uranik Berisha et.al. | 2505.15414 | null |
2025-05-21 | Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites | Xintong Wang et.al. | 2505.15297 | null |
2025-05-21 | Time Tracker: Mixture-of-Experts-Enhanced Foundation Time Series Forecasting Model with Decoupled Training Pipelines | Xiaohou Shi et.al. | 2505.15151 | null |
2025-05-20 | Multimodal Cultural Safety: Evaluation Frameworks and Alignment Strategies | Haoyi Qiu et.al. | 2505.14972 | link |
2025-05-30 | TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis | Yu Zhang et.al. | 2505.14910 | link |
2025-05-20 | Balanced and Elastic End-to-end Training of Dynamic LLMs | Mohamed Wahib et.al. | 2505.14864 | null |
2025-05-20 | Solving MNIST with a globally trained Mixture of Quantum Experts | Paolo Alessandro Xavier Tognini et.al. | 2505.14789 | null |
2025-05-27 | Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training | Mengru Wang et.al. | 2505.14681 | null |
2025-05-21 | Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach | Umberto Cappellazzo et.al. | 2505.14336 | null |
2025-05-20 | FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation | Shaolin Zhu et.al. | 2505.14256 | null |
2025-05-20 | THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation | Yunlong Liang et.al. | 2505.14173 | null |
2025-05-20 | Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition | Shuo Zhang et.al. | 2505.14143 | null |
2025-05-20 | Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging | Ryo Bertolissi et.al. | 2505.14136 | null |
2025-05-20 | Generalizable Multispectral Land Cover Classification via Frequency-Aware Mixture of Low-Rank Token Experts | Xi Chen et.al. | 2505.14088 | null |
2025-05-20 | StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning | Huaijie Wang et.al. | 2505.13997 | null |
2025-05-20 | Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive Prompting | Bao-Ngoc Dao et.al. | 2505.13944 | link |
2025-05-27 | U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding | Ziqian Wang et.al. | 2505.13880 | link |
2025-05-20 | EfficientLLM: Efficiency in Large Language Models | Zhengqing Yuan et.al. | 2505.13840 | null |
2025-05-19 | CompeteSMoE – Statistically Guaranteed Mixture of Experts Training via Competition | Nam V. Nguyen et.al. | 2505.13380 | link |
2025-05-19 | Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference | Shuqing Luo et.al. | 2505.13345 | link |
2025-05-19 | Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models | Lucas Berry et.al. | 2505.13273 | null |
2025-05-19 | True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics | Christoph Jürgen Hemmer et.al. | 2505.13192 | null |
2025-05-23 | Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures | Tuan Thai et.al. | 2505.13052 | null |
2025-05-19 | TransferTraj: A Vehicle Trajectory Learning Model for Region and Task Transferability | Tonglong Wei et.al. | 2505.12672 | null |
2025-05-30 | Scene-Adaptive Motion Planning with Explicit Mixture of Experts and Interaction-Oriented Optimization | Hongbiao Zhu et.al. | 2505.12311 | null |
2025-05-22 | Model Merging in Pre-training of Large Language Models | Yunshui Li et.al. | 2505.12082 | null |
2025-05-22 | Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression Recognition | Runduo Han et.al. | 2505.12007 | link |
2025-05-17 | MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging | Zihuan Qiu et.al. | 2505.11883 | null |
2025-05-17 | Improving Coverage in Combined Prediction Sets with Weighted p-values | Gina Wong et.al. | 2505.11785 | null |
2025-05-16 | HessFormer: Hessians at Foundation Scale | Diego Granziol et.al. | 2505.11564 | null |
2025-05-10 | PRIME: Physics-Related Intelligent Mixture of Experts for Transistor Characteristics Prediction | Zhenxing Dou et.al. | 2505.11523 | null |
2025-05-19 | MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production | Chao Jin et.al. | 2505.11432 | null |
2025-05-21 | MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems | Yinsicheng Jiang et.al. | 2505.11415 | null |
2025-05-16 | A Fast Kernel-based Conditional Independence test with Application to Causal Discovery | Oliver Schacht et.al. | 2505.11085 | null |
2025-05-16 | On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating | Huy Nguyen et.al. | 2505.10860 | null |
2025-05-14 | PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning | Zongqian Li et.al. | 2505.09519 | link |
2025-05-14 | Qwen3 Technical Report | An Yang et.al. | 2505.09388 | link |
2025-05-14 | Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures | Chenggang Zhao et.al. | 2505.09343 | null |
2025-05-29 | Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony | Shaoyu Wang et.al. | 2505.08944 | null |
2025-05-13 | PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts | Yang Su et.al. | 2505.08719 | null |
2025-05-25 | AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale | Yunjie Ji et.al. | 2505.08311 | null |
2025-05-12 | UMoE: Unifying Attention and FFN with Shared Experts | Yuanhang Yang et.al. | 2505.07260 | null |
2025-05-11 | Seed1.5-VL Technical Report | Dong Guo et.al. | 2505.07062 | null |
2025-05-21 | FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers | Tianyu Chen et.al. | 2505.06858 | null |
2025-05-11 | The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts | Enric Boix-Adsera et.al. | 2505.06839 | null |
2025-05-10 | Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free | Zihan Qiu et.al. | 2505.06708 | link |
2025-05-30 | Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding | Dawei Huang et.al. | 2505.06685 | link |
2025-05-10 | QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration | HamidReza Imani et.al. | 2505.06481 | null |
2025-05-06 | A Sensitivity-Driven Expert Allocation Method in LoRA-MoE for Efficient Fine-Tuning | Junzhou Xu et.al. | 2505.06272 | null |
2025-05-12 | FloE: On-the-Fly MoE Inference on Memory-constrained GPU | Yuxin Zhou et.al. | 2505.05950 | null |
2025-05-09 | MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design | Haojie Duanmu et.al. | 2505.05799 | link |
2025-05-10 | SDR-RDMA: Software-Defined Reliability Architecture for Planetary Scale RDMA Communication | Mikhail Khalilov et.al. | 2505.05366 | null |
2025-05-08 | Divide-and-Conquer: Cold-Start Bundle Recommendation via Mixture of Diffusion Experts | Ming Li et.al. | 2505.05035 | null |
2025-05-07 | Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs | Yehui Tang et.al. | 2505.04519 | null |
2025-05-07 | SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios | Ning Cheng et.al. | 2505.04201 | null |
2025-05-07 | LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress? | Teddy Foley et.al. | 2505.04075 | link |
2025-05-07 | Shadow Wireless Intelligence: Large Language Model-Driven Reasoning in Covert Communications | Yuanai Xie et.al. | 2505.04068 | null |
2025-05-24 | Perception-Informed Neural Networks: Beyond Physics-Informed Neural Networks | Mehran Mazandarani et.al. | 2505.03806 | null |
2025-05-02 | MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance | Xing Hu et.al. | 2505.03804 | null |
2025-05-06 | Towards Smart Point-and-Shoot Photography | Jiawan Li et.al. | 2505.03638 | null |
2025-05-06 | Faster MoE LLM Inference for Extremely Large Models | Haoqi Yang et.al. | 2505.03531 | null |
2025-05-06 | STAR-Rec: Making Peace with Length Variance and Pattern Diversity in Sequential Recommendation | Maolin Wang et.al. | 2505.03484 | null |
2025-05-06 | 3D Gaussian Splatting Data Compression with Mixture of Priors | Lei Liu et.al. | 2505.03310 | null |
2025-05-05 | Finger Pose Estimation for Under-screen Fingerprint Sensor | Xiongjun Guan et.al. | 2505.02481 | link |
2025-05-05 | Multimodal Deep Learning-Empowered Beam Prediction in Future THz ISAC Systems | Kai Zhang et.al. | 2505.02381 | null |
2025-05-08 | Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques | Sanjay Surendranath Girija et.al. | 2505.02309 | null |
2025-05-04 | Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields | Zhenxing Mi et.al. | 2505.02005 | link |
2025-05-03 | Backdoor Attacks Against Patch-based Mixture of Experts | Cedric Chan et.al. | 2505.01811 | link |
2025-05-01 | MoxE: Mixture of xLSTM Experts with Entropy-Aware Routing for Efficient Language Modeling | Abdoul Majid O. Thiombiano et.al. | 2505.01459 | null |
2025-05-02 | Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders | Rogelio A Mancisidor et.al. | 2505.01134 | null |
2025-05-02 | CoCoAFusE: Beyond Mixtures of Experts via Model Fusion | Aurelio Raffa Ugolini et.al. | 2505.01105 | null |
2025-05-01 | Improving Routing in Sparse Mixture of Experts with Graph of Tokens | Tam Nguyen et.al. | 2505.00792 | null |
2025-05-01 | CICADA: Cross-Domain Interpretable Coding for Anomaly Detection and Adaptation in Multivariate Time Series | Tian Lan et.al. | 2505.00415 | null |
2025-05-01 | Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing | Piotr Piękos et.al. | 2505.00315 | link |
2025-04-30 | Online Federation For Mixtures of Proprietary Agents with Black-Box Encoders | Xuwei Yang et.al. | 2505.00216 | null |
2025-05-08 | Identifying Critical Dependencies in Large-Scale Continuous Software Engineering | Anastasiia Tkalich et.al. | 2504.21437 | null |
2025-04-29 | TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts | Pradip Kunwar et.al. | 2504.21190 | null |
2025-04-29 | Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization | Shuai Gong et.al. | 2504.21063 | null |
2025-04-26 | PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight | Ben Goertzel et.al. | 2504.21029 | null |
2025-04-29 | In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer | Zechuan Zhang et.al. | 2504.20690 | null |
2025-05-30 | ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting | Yu Zhang et.al. | 2504.20630 | null |
2025-04-29 | MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification | Yichu Xu et.al. | 2504.20509 | null |
2025-04-29 | FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks | Wenjing Xiao et.al. | 2504.20446 | null |
2025-04-29 | MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation | Amaan Izhar et.al. | 2504.20343 | link |
2025-04-28 | Accelerating Mixture-of-Experts Training with Adaptive Expert Replication | Athinagoras Skiadopoulos et.al. | 2504.19925 | null |
2025-04-28 | DUETS: Setting expectations for asteroseismic binaries and binary products with synthetic populations | A. Mazzi et.al. | 2504.19866 | null |
2025-04-28 | Decentralization of Generative AI via Mixture of Experts for Wireless Networks: A Comprehensive Survey | Yunting Xu et.al. | 2504.19660 | null |
2025-05-04 | ARTEMIS: Autoregressive End-to-End Trajectory Planning with Mixture of Experts for Autonomous Driving | Renju Feng et.al. | 2504.19580 | link |
2025-05-30 | Versatile Framework for Song Generation with Prompt-based Control | Yu Zhang et.al. | 2504.19062 | null |
2025-04-29 | BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts | Qingyue Wang et.al. | 2504.18598 | null |
2025-04-25 | NoEsis: Differentially Private Knowledge Transfer in Modular LLM Adaptation | Rob Romijnders et.al. | 2504.18147 | null |
2025-05-15 | TGDT: A Temporal Graph-based Digital Twin for Urban Traffic Corridors | Nooshin Yousefzadeh et.al. | 2504.18008 | null |
2025-06-11 | Unveiling the Hidden: Movie Genre and User Bias in Spoiler Detection | Haokai Zhang et.al. | 2504.17834 | link |
2025-04-22 | Compass-V2 Technical Report | Sophia Maria et.al. | 2504.15527 | null |
2025-04-21 | Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images | Jonathan Brokman et.al. | 2504.15470 | link |
2025-04-17 | D $^{2}$ MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving | Haodong Wang et.al. | 2504.15299 | null |
2025-04-23 | MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core | Dennis Liu et.al. | 2504.14960 | null |
2025-04-20 | Evaluating Temporal Plasticity in Foundation Time Series Models for Incremental Fine-tuning | Jia Liu et.al. | 2504.14677 | null |
2025-04-29 | Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning | ByteDance Seed et.al. | 2504.13914 | null |
2025-04-18 | Multi-Type Context-Aware Conversational Recommender Systems via Mixture-of-Experts | Jie Zou et.al. | 2504.13655 | null |
2025-04-18 | HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering | Alexander Rusnak et.al. | 2504.13590 | null |
2025-04-18 | Dense Backpropagation Improves Training for Sparse Mixture-of-Experts | Ashwinee Panda et.al. | 2504.12463 | link |
2025-04-16 | Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models | Yuanbo Tang et.al. | 2504.12359 | null |
2025-04-16 | Trend Filtered Mixture of Experts for Automated Gating of High-Frequency Flow Cytometry Data | Sangwon Hyun et.al. | 2504.12287 | null |
2025-04-16 | The Discovery of Two Quadruple Star Systems with the Second and Third Shortest Outer Periods | Brian P. Powell et.al. | 2504.12239 | null |
2025-04-16 | MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models | Hang Yuan et.al. | 2504.12234 | null |
2025-04-13 | Transmission of low energy electrons through a polyethylene terephthalate 800-nm diameter nanocapillary | Li Pengfei et.al. | 2504.11479 | null |
2025-04-15 | Simulation-based inference for stochastic nonlinear mixed-effects models with applications in systems biology | Henrik Häggström et.al. | 2504.11279 | link |
2025-05-22 | Harnessing the Computation Redundancy in ViTs to Boost Adversarial Transferability | Jiani Liu et.al. | 2504.10804 | null |
2025-04-14 | Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning | LeiLei Ma et.al. | 2504.09990 | null |
2025-04-14 | DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training | Masahiro Tanaka et.al. | 2504.09983 | null |
2025-04-14 | Multi-objective Bayesian Optimization With Mixed-categorical Design Variables for Expensive-to-evaluate Aeronautical Applications | Nathalie Bartoli et.al. | 2504.09930 | null |
2025-04-14 | Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming | Zhiqiang He et.al. | 2504.09906 | null |
2025-04-13 | Mixture-of-Shape-Experts (MoSE): End-to-End Shape Dictionary Framework to Prompt SAM for Generalizable Medical Segmentation | Jia Wei et.al. | 2504.09601 | null |
2025-04-12 | MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints | Yichao Yuan et.al. | 2504.09345 | null |
2025-04-12 | Mixture of Group Experts for Learning Invariant Representations | Lei Kang et.al. | 2504.09265 | null |
2025-04-12 | Exploring Modality Disruption in Multimodal Fake News Detection | Moyang Liu et.al. | 2504.09154 | null |
2025-05-08 | RouterKT: Mixture-of-Experts for Knowledge Tracing | Han Liao et.al. | 2504.08989 | null |
2025-03-23 | ExpertRAG: Efficient RAG with Mixture of Experts – Optimizing Context Retrieval for Adaptive LLM Responses | Esmail Gumaan et.al. | 2504.08744 | null |
2025-04-11 | Regularized infill criteria for multi-objective Bayesian optimization with application to aircraft design | Robin Grapin et.al. | 2504.08671 | null |
2025-04-11 | Millions of States: Designing a Scalable MoE Architecture with RWKV-7 Meta-learner | Liu Xiao et.al. | 2504.08247 | null |
2025-04-10 | C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing | Zhongyang Li et.al. | 2504.07964 | link |
2025-04-11 | Scaling Laws for Native Multimodal Models | Mustafa Shukor et.al. | 2504.07951 | null |
2025-04-10 | Cluster-Driven Expert Pruning for Mixture-of-Experts Large Language Models | Hongcheng Guo et.al. | 2504.07807 | link |
2025-04-10 | Adaptive Detection of Fast Moving Celestial Objects Using a Mixture of Experts and Physical-Inspired Neural Network | Peng Jia et.al. | 2504.07777 | null |
2025-04-15 | Kimi-VL Technical Report | Kimi Team et.al. | 2504.07491 | link |
2025-04-09 | MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution | Zhe Wang et.al. | 2504.07308 | link |
2025-04-11 | Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models | Ling Team et.al. | 2504.07158 | null |
2025-05-28 | Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations | Zican Dong et.al. | 2504.06792 | null |
2025-04-24 | FedMerge: Federated Personalization via Model Merging | Shutong Chen et.al. | 2504.06768 | null |
2025-04-08 | S’MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning | Hanqing Zeng et.al. | 2504.06426 | null |
2025-04-08 | HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference | Shuzhang Zhong et.al. | 2504.05897 | link |
2025-04-08 | Adaptive Substructure-Aware Expert Model for Molecular Property Prediction | Tianyi Jiang et.al. | 2504.05844 | null |
2025-04-10 | Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations | Ajay Jaiswal et.al. | 2504.05586 | null |
2025-04-07 | SUEDE:Shared Unified Experts for Physical-Digital Face Attack Detection Enhancement | Zuying Xie et.al. | 2504.04818 | null |
2025-04-06 | On the Spatial Structure of Mixture-of-Experts in Transformers | Daniel Bershatsky et.al. | 2504.04444 | null |
2025-04-05 | Collaboration and Controversy Among Experts: Rumor Early Detection by Tuning a Comment Generator | Bing Wang et.al. | 2504.04076 | link |
2025-04-04 | HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs | Yongji Wu et.al. | 2504.03871 | null |
2025-04-01 | Detecting Financial Fraud with Hybrid Deep Learning: A Mix-of-Experts Approach to Sequential and Anomalous Patterns | Diego Vallarino et.al. | 2504.03750 | null |
2025-04-01 | A Unified Virtual Mixture-of-Experts Framework:Enhanced Inference and Hallucination Mitigation in Single-Model System | Mingyan Liu et.al. | 2504.03739 | null |
2025-03-26 | A multi-scale lithium-ion battery capacity prediction using mixture of experts and patch-based MLP | Yuzhu Lei et.al. | 2504.03706 | link |
2025-04-04 | RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation | Hanbo Bi et.al. | 2504.03166 | null |
2025-06-01 | TeleMoM: Consensus-Driven Telecom Intelligence via Mixture of Models | Xinquan Wang et.al. | 2504.02712 | null |
2025-04-07 | MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators | Beichen Huang et.al. | 2504.02658 | link |
2025-04-24 | Cognitive Memory in Large Language Models | Lianlei Shan et.al. | 2504.02441 | null |
2025-04-23 | MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism | Ruidong Zhu et.al. | 2504.02263 | null |
2025-04-20 | Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design | Mohan Zhang et.al. | 2504.01337 | null |
2025-04-01 | Mixture-of-Experts for Distributed Edge Computing with Channel-Aware Gating Function | Qiuchen Song et.al. | 2504.00819 | null |
2025-04-01 | DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism | Dengchun Li et.al. | 2504.00661 | link |
2025-04-01 | CFP: Low-overhead Profiling-based Intra-operator Parallelism Generation by Preserving Communication-Free Structures | Weifang Hu et.al. | 2504.00598 | null |
2025-04-01 | Continual Cross-Modal Generalization | Yan Xia et.al. | 2504.00561 | null |
2025-04-01 | Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection | Shunxin Chen et.al. | 2504.00458 | null |
2025-03-31 | Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion | Jiagen Li et.al. | 2503.23721 | null |
2025-05-16 | Mixture of Routers | Jia-Chen Zhang et.al. | 2503.23362 | null |
2025-05-25 | MoLAE: Mixture of Latent Experts for Parameter-Efficient Language Models | Zehua Liu et.al. | 2503.23100 | null |
2025-03-29 | S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning | Giang Do et.al. | 2503.23007 | null |
2025-03-29 | Sparse Mixture of Experts as Unified Competitive Learning | Giang Do et.al. | 2503.22996 | null |
2025-03-26 | Reasoning Beyond Limits: Advances and Open Problems for LLMs | Mohamed Amine Ferrag et.al. | 2503.22732 | null |
2025-04-01 | Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities | Raman Dutt et.al. | 2503.22517 | null |
2025-04-29 | RocketPPA: Ultra-Fast LLM-Based PPA Estimator at Code-Level Abstraction | Armin Abdollahi et.al. | 2503.21971 | null |
2025-05-08 | Binarity at LOw Metallicity (BLOeM): Enhanced multiplicity of early B-type dwarfs and giants at $Z=0.2\,{\rm Z}_\odot$ | J. I. Villaseñor et.al. | 2503.21936 | null |
2025-03-27 | iMedImage Technical Report | Ran Wei et.al. | 2503.21836 | null |
2025-03-27 | LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models | Hengyuan Zhao et.al. | 2503.21227 | null |
2025-05-17 | MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness | Zihao Zheng et.al. | 2503.21135 | null |
2025-03-26 | Optimal Scaling Laws for Efficiency Gains in a Theoretical Transformer-Augmented Sectional MoE Framework | Soham Sane et.al. | 2503.20750 | null |
2025-03-26 | UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines | Chen Tang et.al. | 2503.20748 | null |
2025-03-26 | Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning | Sashuai Zhou et.al. | 2503.20633 | null |
2025-04-14 | MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation | Rongyu Zhang et.al. | 2503.20384 | null |
2025-03-26 | Modality-Independent Brain Lesion Segmentation with Privacy-aware Continual Learning | Yousef Sadegheih et.al. | 2503.20326 | link |
2025-03-31 | Resilient Sensor Fusion under Adverse Sensor Failures via Multi-Modal Expert Fusion | Konyul Park et.al. | 2503.19776 | null |
2025-04-30 | BiPrompt-SAM: Enhancing Image Segmentation via Explicit Selection between Point and Text Prompts | Suzhe Xu et.al. | 2503.19769 | null |
2025-03-25 | M $^2$ CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation | Ziyuan Liu et.al. | 2503.19406 | null |
2025-04-21 | Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design | Rui Xie et.al. | 2503.18869 | null |
2025-04-30 | Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding | Tianyu Chen et.al. | 2503.18578 | null |
2025-03-24 | SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking | Wenrui Cai et.al. | 2503.18338 | null |
2025-04-01 | Challenging Dataset and Multi-modal Gated Mixture of Experts Model for Remote Sensing Copy-Move Forgery Understanding | Ze Zhang et.al. | 2503.18104 | link |
2025-03-22 | Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM | Codefuse et.al. | 2503.17793 | null |
2025-03-25 | Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts | Yike Yuan et.al. | 2503.16057 | null |
2025-03-21 | UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations | Debabrata Mandal et.al. | 2503.15868 | null |
2025-03-20 | Mixture of Lookup Experts | Shibo Jie et.al. | 2503.15798 | link |
2025-03-21 | Leveraging MoE-based Large Language Model for Zero-Shot Multi-Task Semantic Communication | Sin-Yu Huang et.al. | 2503.15722 | null |
2025-04-29 | SemEval-2025 Task 1: AdMIRe – Advancing Multimodal Idiomaticity Representation | Thomas Pickard et.al. | 2503.15358 | null |
2025-03-21 | Body-Hand Modality Expertized Networks with Cross-attention for Fine-grained Skeleton Action Recognition | Seungyeon Cho et.al. | 2503.14960 | null |
2025-03-18 | Core-Periphery Principle Guided State Space Model for Functional Connectome Classification | Minheng Chen et.al. | 2503.14655 | null |
2025-03-18 | DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers | Minglei Shi et.al. | 2503.14487 | null |
2025-03-18 | MAST-Pro: Dynamic Mixture-of-Experts for Adaptive Segmentation of Pan-Tumors with Knowledge-Driven Prompts | Runqi Meng et.al. | 2503.14355 | null |
2025-03-18 | Frac-Connections: Fractional Extension of Hyper-Connections | Defa Zhu et.al. | 2503.14125 | null |
2025-03-18 | SNAKE: A Sustainable and Multi-functional Traffic Analysis System utilizing Specialized Large-Scale Models with a Mixture of Experts Architecture | Tian Qin et.al. | 2503.13808 | null |
2025-03-13 | Ensemble Learning for Large Language Models in Text and Code Generation: A Survey | Mari Ashiga et.al. | 2503.13505 | null |
2025-03-17 | Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge | Shengling Qin et.al. | 2503.13421 | null |
2025-05-10 | Channel Estimation for Pinching-Antenna Systems (PASS) | Jian Xiao et.al. | 2503.13268 | null |
2025-03-17 | Federated Mixture-of-Expert for Non-Overlapped Cross-Domain Sequential Recommendation | Yu Liu et.al. | 2503.13254 | null |
2025-05-21 | Fast filtering of non-Gaussian models using Amortized Optimal Transport Maps | Mohammad Al-Jarrah et.al. | 2503.12633 | link |
2025-03-16 | MoECollab: Democratizing LLM Development Through Collaborative Mixture of Experts | Harshit et.al. | 2503.12592 | null |
2025-03-16 | MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification | Jianwei Zhao et.al. | 2503.12401 | null |
2025-05-10 | Adaptive Mixture of Low-Rank Experts for Robust Audio Spoofing Detection | Qixian Chen et.al. | 2503.12010 | null |
2025-03-14 | FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-the-World LoRA | Jieming Bian et.al. | 2503.11880 | null |
2025-03-10 | MELON: Multimodal Mixture-of-Experts with Spectral-Temporal Fusion for Long-Term Mobility Estimation in Critical Care | Jiaqing Zhang et.al. | 2503.11695 | null |
2025-03-14 | A Review of DeepSeek Models’ Key Innovative Techniques | Chengen Wang et.al. | 2503.11486 | null |
2025-03-14 | MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling | Rachel S. Y. Teo et.al. | 2503.11144 | link |
2025-03-13 | Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores | Chenpeng Wu et.al. | 2503.10725 | link |
2025-05-19 | dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis | Luyuan Xie et.al. | 2503.10412 | null |
2025-04-10 | Continual Text-to-Video Retrieval with Frame Fusion and Task-Aware Routing | Zecheng Zhao et.al. | 2503.10111 | link |
2025-03-12 | MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching | Tairan Xu et.al. | 2503.09716 | null |
2025-03-12 | Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework | Bakary Badjie et.al. | 2503.09504 | null |
2025-03-12 | Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment | Nazanin Moradinasab et.al. | 2503.09498 | link |
2025-04-01 | Astrea: A MOE-based Visual Understanding Model with Progressive Alignment | Xiaoda Yang et.al. | 2503.09445 | null |
2025-03-12 | Automatic Operator-level Parallelism Planning for Distributed Deep Learning – A Mixed-Integer Programming Approach | Ruifeng She et.al. | 2503.09357 | null |
2025-03-12 | Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference | Mohammad Siavashi et.al. | 2503.09304 | null |
2025-03-13 | FaVChat: Unlocking Fine-Grained Facial Video Understanding with Multimodal Large Language Models | Fufangchen Zhao et.al. | 2503.09158 | null |
2025-03-11 | MoE-Loco: Mixture of Experts for Multitask Locomotion | Runhan Huang et.al. | 2503.08564 | null |
2025-03-11 | BoundarEase: Fostering Constructive Community Engagement to Inform More Equitable Student Assignment Policies | Cassandra Overney et.al. | 2503.08543 | link |
2025-03-11 | Accelerating MoE Model Inference with Expert Sharding | Oana Balmau et.al. | 2503.08467 | null |
2025-03-26 | Uni $\textbf{F}^2$ ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models | Junzhe Li et.al. | 2503.08120 | null |
2025-03-11 | MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models | Han Zhao et.al. | 2503.08007 | null |
2025-03-10 | Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM | Yongqiang Yao et.al. | 2503.07680 | null |
2025-04-01 | TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster | Kanghui Ning et.al. | 2503.07649 | null |
2025-03-05 | BrainNet-MoE: Brain-Inspired Mixture-of-Experts Learning for Neurological Disease Identification | Jing Zhang et.al. | 2503.07640 | null |
2025-03-05 | Mixture of Experts Made Intrinsically Interpretable | Xingyi Yang et.al. | 2503.07639 | null |
2025-03-26 | GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts | Minwen Liao et.al. | 2503.07417 | null |
2025-04-18 | A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications | Siyuan Mu et.al. | 2503.07137 | link |
2025-03-10 | VMTS: Vision-Assisted Teacher-Student Reinforcement Learning for Multi-Terrain Locomotion in Bipedal Robots | Fu Chen et.al. | 2503.07049 | link |
2025-03-10 | ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration | Mengting Ai et.al. | 2503.06881 | link |
2025-03-10 | eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference | Suraiya Tairin et.al. | 2503.06823 | null |
2025-03-09 | MoFE: Mixture of Frozen Experts Architecture | Jean Seo et.al. | 2503.06491 | null |
2025-03-25 | Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models | Nguyen Do et.al. | 2503.06413 | link |
2025-03-08 | MoEMoE: Question Guided Dense and Scalable Sparse Mixture-of-Expert for Multi-source Multi-modal Answering | Vinay Kumar Verma et.al. | 2503.06296 | null |
2025-03-08 | A Novel Trustworthy Video Summarization Algorithm Through a Mixture of LoRA Experts | Wenzhuo Du et.al. | 2503.06064 | null |
2025-03-08 | MANDARIN: Mixture-of-Experts Framework for Dynamic Delirium and Coma Prediction in ICU Patients: Development and Validation of an Acute Brain Dysfunction Prediction Model | Miguel Contreras et.al. | 2503.06059 | null |
2025-03-08 | GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices | Xudong Lu et.al. | 2503.06019 | null |
2025-03-03 | How Do Consumers Really Choose: Exposing Hidden Preferences with the Mixture of Experts Model | Diego Vallarino et.al. | 2503.05800 | null |
2025-03-11 | Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning | Justin Chih-Yao Chen et.al. | 2503.05641 | null |
2025-03-07 | FMT:A Multimodal Pneumonia Detection Model Based on Stacking MOE Framework | Jingyu Xu et.al. | 2503.05626 | null |
2025-04-15 | Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts | Weigao Sun et.al. | 2503.05447 | link |
2025-03-10 | Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs | Ling Team et.al. | 2503.05139 | null |
2025-03-07 | Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts | Shwai He et.al. | 2503.05066 | null |
2025-03-06 | Continual Pre-training of MoEs: How robust is your router? | Benjamin Thérien et.al. | 2503.05029 | null |
2025-02-25 | Comparative Analysis Based on DeepSeek, ChatGPT, and Google Gemini: Features, Techniques, Performance, Future Prospects | Anichur Rahman et.al. | 2503.04783 | null |
2025-03-19 | Predictable Scale: Part I – Optimal Hyperparameter Scaling Law in Large Language Model Pretraining | Houyi Li et.al. | 2503.04715 | null |
2025-03-07 | Question-Aware Gaussian Experts for Audio-Visual Question Answering | Hongyeob Kim et.al. | 2503.04459 | link |
2025-03-19 | Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling | Yan Li et.al. | 2503.04398 | null |
2025-03-06 | A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery | Yiheng Zhu et.al. | 2503.04362 | null |
2025-03-06 | Quantum metric induced magneto-optical effects in $\mathcal{PT}$ -symmetric antiferromagnets | Yongpan Li et.al. | 2503.04312 | null |
2025-03-06 | DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval | Yating Liu et.al. | 2503.04144 | null |
2025-03-05 | VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology Detection | Enkhtogtokh Togootogtokh et.al. | 2503.03797 | link |
2025-03-09 | Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs | Haoran Fan et.al. | 2503.03594 | link |
2025-03-05 | Convergence Rates for Softmax Gating Mixture of Experts | Huy Nguyen et.al. | 2503.03213 | null |
2025-03-04 | MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation | Weihang Wang et.al. | 2503.02799 | link |
2025-03-04 | FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting | Congluo Xu et.al. | 2503.02692 | null |
2025-03-06 | Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer | Yujiao Yang et.al. | 2503.02495 | link |
2025-03-04 | Tabby: Tabular Data Synthesis with Language Models | Sonia Cromp et.al. | 2503.02152 | null |
2025-03-03 | ECG-EmotionNet: Nested Mixture of Expert (NMoE) Adaptation of ECG-Foundation Model for Driver Emotion Recognition | Nastaran Mansourian et.al. | 2503.01750 | null |
2025-03-03 | Effective High-order Graph Representation Learning for Credit Card Fraud Detection | Yao Zou et.al. | 2503.01556 | null |
2025-03-03 | DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models | Yongqi Huang et.al. | 2503.01359 | null |
2025-03-03 | PROPER: A Progressive Learning Framework for Personalized Large Language Models with Group-Level Adaptation | Linhai Zhang et.al. | 2503.01303 | null |
2025-03-03 | Unify and Anchor: A Context-Aware Transformer for Cross-Domain Time Series Forecasting | Xiaobin Hong et.al. | 2503.01157 | null |
2025-03-02 | Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion | Daiki Nishiyama et.al. | 2503.00925 | null |
2025-03-01 | Efficiently Editing Mixture-of-Experts Models with Compressed Experts | Yifei He et.al. | 2503.00634 | null |
2025-03-01 | CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering | Tianyu Huai et.al. | 2503.00413 | null |
2025-02-28 | CoSMoEs: Compact Sparse Mixture of Experts | Patrick Huber et.al. | 2503.00245 | null |
2025-02-26 | Omni-SILA: Towards Omni-scene Driven Visual Sentiment Identifying, Locating and Attributing in Videos | Jiamin Luo et.al. | 2503.00049 | null |
2025-03-01 | R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts | Zhongyang Li et.al. | 2502.20395 | link |
2025-02-27 | Mixture of Experts for Recognizing Depression from Interview and Reading Tasks | Loukas Ilias et.al. | 2502.20213 | null |
2025-02-27 | Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems | Zeyi Ren et.al. | 2502.20183 | null |
2025-02-27 | UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook | Yidi Jiang et.al. | 2502.20067 | null |
2025-02-27 | AsymLoRA: Harmonizing Data Conflicts and Commonalities in MLLMs | Xuyang Wei et.al. | 2502.20035 | link |
2025-03-04 | Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts | Shulai Zhang et.al. | 2502.19811 | link |
2025-02-27 | Extension of SUSY SU(5) GUTs with Nelson-Barr models | Junji Hisano et.al. | 2502.19686 | null |
2025-03-15 | Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization | Taishi Nakamura et.al. | 2502.19261 | null |
2025-02-26 | OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment | Jiaxin Deng et.al. | 2502.18965 | null |
2025-02-26 | Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLM | Junxiao Ma et.al. | 2502.18863 | null |
2025-02-25 | Generative AI-enabled Wireless Communications for Robust Low-Altitude Economy Networking | Changyuan Zhao et.al. | 2502.18118 | null |
2025-02-09 | MoEMba: A Mamba-based Mixture of Experts for High-Density EMG-based Hand Gesture Recognition | Mehran Shabanpour et.al. | 2502.17457 | null |
2025-03-17 | The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE | Andrei Chernov et.al. | 2502.17391 | null |
2025-02-24 | Delta Decompression for MoE-based LLMs Compression | Hao Gu et.al. | 2502.17298 | link |
2025-02-24 | Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks | Andrei Chernov et.al. | 2502.17187 | null |
2025-02-24 | Muon is Scalable for LLM Training | Jingyuan Liu et.al. | 2502.16982 | link |
2025-03-07 | BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference | Zewen Jin et.al. | 2502.16927 | null |
2025-02-24 | ENACT-Heart – ENsemble-based Assessment Using CNN and Transformer on Heart Sounds | Jiho Han et.al. | 2502.16914 | null |
2025-02-26 | Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment | Chenghao Fan et.al. | 2502.16894 | null |
2025-02-22 | An Autonomous Network Orchestration Framework Integrating Large Language Models with Continual Reinforcement Learning | Masoud Shokrnezhad et.al. | 2502.16198 | null |
2025-02-20 | A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models | Mengyang Sun et.al. | 2502.15828 | link |
2025-03-20 | Binary-Integer-Programming Based Algorithm for Expert Load Balancing in Mixture-of-Experts Models | Yuan Sun et.al. | 2502.15451 | link |
2025-03-02 | Tight Clusters Make Specialized Experts | Stefan K. Nielsen et.al. | 2502.15315 | link |
2025-02-21 | Multimodal Graph-Based Variational Mixture of Experts Network for Zero-Shot Multimodal Information Extraction | Baohang Zhou et.al. | 2502.15290 | link |
2025-02-20 | Ray-Tracing for Conditionally Activated Neural Networks | Claudio Gallicchio et.al. | 2502.14788 | null |
2025-02-21 | ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model | Zhongyi Zhou et.al. | 2502.14420 | null |
2025-02-19 | MoM: Linear Sequence Modeling with Mixture-of-Memories | Jusen Du et.al. | 2502.13685 | link |
2025-02-19 | Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts | Xin Li et.al. | 2502.13577 | null |
2025-02-18 | MoBA: Mixture of Block Attention for Long-Context LLMs | Enzhe Lu et.al. | 2502.13189 | link |
2025-02-18 | Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models | Gyeongman Kim et.al. | 2502.12947 | null |
2025-03-13 | DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs | Minxuan Lv et.al. | 2502.12455 | null |
2025-02-17 | From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs | Kumari Nishu et.al. | 2502.12325 | null |
2025-02-17 | Binarity at LOw Metallicity (BLOeM): Multiplicity of early B-type supergiants in the Small Magellanic Cloud | N. Britavskiy et.al. | 2502.12239 | null |
2025-02-17 | Accurate Expert Predictions in MoE Inference via Cross-Layer Gate | Zhiyuan Fang et.al. | 2502.12224 | null |
2025-02-17 | How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines | Ayan Sengupta et.al. | 2502.12051 | null |
2025-02-17 | Connector-S: A Survey of Connectors in Multi-modal Large Language Models | Xun Zhu et.al. | 2502.11453 | null |
2025-02-16 | Mixture of Tunable Experts – Behavior Modification of DeepSeek-R1 at Inference Time | Robert Dahlke et.al. | 2502.11096 | null |
2025-02-16 | ClimateLLM: Efficient Weather Forecasting via Frequency-Aware Large Language Models | Shixuan Li et.al. | 2502.11059 | null |
2025-02-15 | Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization | Matthew Lyle Olson et.al. | 2502.10928 | null |
2025-02-11 | MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition | Sungnyun Kim et.al. | 2502.10447 | null |
2025-04-03 | Heterogeneous Mixture of Experts for Remote Sensing Image Super-Resolution | Bowen Chen et.al. | 2502.09654 | null |
2025-02-14 | Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting | Nicholas Dronen et.al. | 2502.09500 | link |
2025-02-12 | The MoE-Empowered Edge LLMs Deployment: Architecture, Challenges, and Opportunities | Ning Li et.al. | 2502.08381 | null |
2025-02-12 | Mixture of Decoupled Message Passing Experts with Entropy Constraint for General Node Classification | Xuanze Chen et.al. | 2502.08083 | null |
2025-03-09 | Training Sparse Mixture Of Experts Text Embedding Models | Zach Nussbaum et.al. | 2502.07972 | link |
2025-02-11 | Memory Analysis on the Training Course of DeepSeek Models | Ping Zhang et.al. | 2502.07846 | null |
2025-02-11 | LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid | Weigao Sun et.al. | 2502.07563 | link |
2025-02-11 | MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks | Lotfi Abdelkrim Mecharbat et.al. | 2502.07422 | null |
2025-02-11 | Online Aggregation of Trajectory Predictors | Alex Tong et.al. | 2502.07178 | null |
2025-02-09 | Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline | Zhiyuan Fang et.al. | 2502.06888 | null |
2025-02-12 | Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach | Xu Zhang et.al. | 2502.06832 | null |
2025-02-10 | MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing | Seokjin Go et.al. | 2502.06643 | null |
2025-02-10 | Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE | Haiduo Huang et.al. | 2502.06282 | link |
2025-02-10 | Fair-MoE: Fairness-Oriented Mixture of Experts in Vision-Language Models | Peiran Wang et.al. | 2502.06094 | null |
2025-02-08 | Mol-MoE: Training Preference-Guided Routers for Molecule Generation | Diego Calanzone et.al. | 2502.05633 | null |
2025-02-17 | UbiMoE: A Ubiquitous Mixture-of-Experts Vision Transformer Accelerator With Hybrid Computation Pattern on FPGA | Jiale Dong et.al. | 2502.05602 | link |
2025-02-07 | fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving | Hanfei Yu et.al. | 2502.05370 | null |
2025-02-07 | Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts | Roussel Desmond Nzoyem et.al. | 2502.05335 | null |
2025-02-19 | Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient | Jan Ludziejewski et.al. | 2502.05172 | null |
2025-02-06 | Mixture of neural operator experts for learning boundary conditions and model selection | Dwyer Deighan et.al. | 2502.04562 | null |
2025-02-06 | CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference | Zehua Pei et.al. | 2502.04416 | link |
2025-02-06 | Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning | Peizhuang Cong et.al. | 2502.03884 | null |
2025-03-20 | A Retrospective Systematic Study on Hierarchical Sparse Query Transformer-assisted Ultrasound Screening for Early Hepatocellular Carcinoma | Chaoyin She et.al. | 2502.03772 | link |
2025-02-05 | (GG) MoE vs. MLP on Tabular Data | Andrei Chernov et.al. | 2502.03608 | null |
2025-02-05 | RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts | Tuan Truong et.al. | 2502.03044 | null |
2025-03-22 | On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation | Nghiem T. Diep et.al. | 2502.03029 | null |
2025-02-05 | Scaling Laws for Upcycling Mixture-of-Experts Language Models | Seng Pei Liew et.al. | 2502.03009 | null |
2025-02-04 | ReGNet: Reciprocal Space-Aware Long-Range Modeling and Multi-Property Prediction for Crystals | Jianan Nie et.al. | 2502.02748 | null |
2025-02-04 | Binarity at LOw Metallicity (BLOeM): The multiplicity properties and evolution of BAF-type supergiants | L. R. Patrick et.al. | 2502.02644 | null |
2025-02-04 | Hecate: Unlocking Efficient Sparse Model Training via Fully Sharded Sparse Data Parallelism | Yuhao Qing et.al. | 2502.02581 | null |
2025-02-07 | Brief analysis of DeepSeek R1 and its implications for Generative AI | Sarah Mercer et.al. | 2502.02523 | null |
2025-02-04 | M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference | Nikhil Bhendawade et.al. | 2502.02040 | null |
2025-02-07 | MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation | Haibo Tong et.al. | 2502.01719 | null |
2025-02-27 | Omni-Mol: Exploring Universal Convergent Space for Omni-Molecular Tasks | Chengxin Hu et.al. | 2502.01074 | null |
2025-02-17 | MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs | Yuhang Zhou et.al. | 2502.00997 | null |
2025-02-03 | CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling | Xinze Wang et.al. | 2502.00965 | null |
2025-02-02 | UniGraph2: Learning a Unified Embedding Space to Bind Multimodal Graphs | Yufei He et.al. | 2502.00806 | null |
2025-02-02 | Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective | Yujin Oh et.al. | 2502.00619 | null |
2025-02-05 | Weak-to-Strong Diffusion with Reflection | Lichen Bai et.al. | 2502.00473 | null |
2025-02-01 | PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated Learning | Yu Feng et.al. | 2502.00354 | link |
2025-02-01 | Sigmoid Self-Attention is Better than Softmax Self-Attention: A Mixture-of-Experts Perspective | Fanqi Yan et.al. | 2502.00281 | null |
2025-01-31 | Pheromone-based Learning of Optimal Reasoning Paths | Anirudh Chari et.al. | 2501.19278 | null |
2025-03-03 | Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning | Minh Le et.al. | 2501.18936 | null |
2025-01-30 | MolGraph-xLSTM: A graph-based dual-level xLSTM framework with multi-head mixture-of-experts for enhanced molecular representation and interpretability | Yan Sun et.al. | 2501.18439 | null |
2025-02-10 | Free Agent in Agent-Based Mixture-of-Experts Generative AI Framework | Jung-Hua Liu et.al. | 2501.17903 | null |
2025-01-29 | Heuristic-Informed Mixture of Experts for Link Prediction in Multilayer Networks | Lucio La Cava et.al. | 2501.17557 | null |
2025-01-28 | 3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow | Yueen Ma et.al. | 2501.16698 | null |
2025-01-27 | Searching for GEMS: Discovery and Characterization of Two Brown Dwarfs Around M Dwarfs | Alexander Larsen et.al. | 2501.16554 | null |
2025-02-12 | One-for-All Does Not Work! Enhancing Vulnerability Detection by Mixture-of-Experts (MoE) | Xu Yang et.al. | 2501.16454 | null |
2025-01-18 | Mixture of Experts (MoE): A Big Data Perspective | Wensheng Gan et.al. | 2501.16352 | null |
2025-01-27 | Static Batching of Irregular Workloads on GPUs: Framework and Application to Efficient MoE Model Inference | Yinghan Li et.al. | 2501.16103 | null |
2025-01-25 | ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning | Shangqian Gao et.al. | 2501.15316 | null |
2025-03-16 | FreqMoE: Enhancing Time Series Forecasting through Frequency Decomposition Mixture of Experts | Ziqi Liu et.al. | 2501.15125 | link |
2025-01-25 | Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning | Ziyu Zhao et.al. | 2501.15103 | null |
2025-01-24 | Mean-field limit from general mixtures of experts to quantum neural networks | Anderson Melchor Hernandez et.al. | 2501.14660 | null |
2025-01-30 | Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential Recommendation | Shengzhe Zhang et.al. | 2501.14269 | link |
2025-03-12 | Sparse Mixture-of-Experts for Non-Uniform Noise Reduction in MRI Images | Zeyun Deng et.al. | 2501.14198 | null |
2025-01-23 | CSAOT: Cooperative Multi-Agent System for Active Object Tracking | Hy Nguyen et.al. | 2501.13994 | null |
2025-01-22 | Autonomy-of-Experts Models | Ang Lv et.al. | 2501.13074 | null |
2025-02-07 | LLM4WM: Adapting LLM for Wireless Multi-Tasking | Xuanyu Liu et.al. | 2501.12983 | null |
2025-01-22 | UniUIR: Considering Underwater Image Restoration as An All-in-One Learner | Xu Zhang et.al. | 2501.12981 | null |
2025-01-22 | BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR | Guodong Ma et.al. | 2501.12602 | null |
2025-02-26 | Modality Interactive Mixture-of-Experts for Fake News Detection | Yifan Liu et.al. | 2501.12431 | link |
2025-01-21 | SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection | Xiaocheng Zhang et.al. | 2501.12430 | null |
2025-01-25 | Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models | Samira Abnar et.al. | 2501.12370 | null |
2025-01-21 | MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks | Qishen Zhou et.al. | 2501.12281 | link |
2025-02-04 | Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models | Zihan Qiu et.al. | 2501.11873 | null |
2025-01-18 | FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models | Xinglin Pan et.al. | 2501.10714 | null |
2024-12-16 | DAOP: Data-Aware Offloading and Predictive Pre-Calculation for Efficient MoE Inference | Yujie Zhang et.al. | 2501.10375 | null |
2025-01-17 | OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning | Jinyuan Feng et.al. | 2501.10062 | null |
2025-01-17 | LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading | Kuan-Ming Liu et.al. | 2501.09636 | null |
2025-01-16 | MoE $^2$ : Optimizing Collaborative Inference for Edge Large Language Models | Lyudong Jin et.al. | 2501.09410 | null |
2025-01-14 | MiniMax-01: Scaling Foundation Models with Lightning Attention | MiniMax et.al. | 2501.08313 | null |
2025-01-14 | Guiding polaritonic energy and momentum through two-dimensional Bravais lattices | Zhonglin Li et.al. | 2501.08123 | null |
2025-02-11 | GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism | Chen Tang et.al. | 2501.07890 | null |
2025-01-18 | PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration | Xiaoshui Huang et.al. | 2501.07762 | null |
2025-01-13 | A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis | Binyu Zhang et.al. | 2501.07016 | link |
2025-01-12 | Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning | Hanwen Zhong et.al. | 2501.06884 | link |
2025-01-12 | A Comprehensive Evaluation of Large Language Models on Mental Illnesses in Arabic Context | Noureldin Zahran et.al. | 2501.06859 | null |
2025-03-18 | TAMER: A Test-Time Adaptive MoE-Driven Framework for EHR Representation Learning | Yinghao Zhu et.al. | 2501.05661 | link |
2025-01-09 | Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing | Mengfan Liu et.al. | 2501.05313 | null |
2025-01-07 | LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes | Xiang Xu et.al. | 2501.04004 | link |
2025-01-07 | mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training | Xudong Liao et.al. | 2501.03905 | null |
2025-01-08 | Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection | Donatella Genovese et.al. | 2501.03432 | null |
2025-01-06 | Solving the Porous Medium Equation with the eXtreme Mesh deformation approach (X-Mesh) | Alexandre Chemin et.al. | 2501.03083 | null |
2025-01-05 | Soft and Compliant Contact-Rich Hair Manipulation and Care | Uksang Yoo et.al. | 2501.02630 | null |
2025-01-12 | Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning | Zhongyi Zhou et.al. | 2501.02198 | null |
2025-03-18 | MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders | Jiajun Cao et.al. | 2501.01709 | null |
2025-01-01 | REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization | Huyen Nguyen et.al. | 2501.00779 | null |
2025-01-06 | Superposition in Transformers: A Novel Way of Building Mixture of Experts | Ayoub Ben Chaliah et.al. | 2501.00530 | link |
2024-12-31 | CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection | Xiaolei Wang et.al. | 2501.00346 | null |
2024-12-30 | SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection | Yuxuan Li et.al. | 2412.20665 | link |
2024-12-29 | Multimodal Variational Autoencoder: a Barycentric View | Peijie Qiu et.al. | 2412.20487 | null |
2025-03-05 | A Comprehensive Framework for Reliable Legal AI: Combining Specialized Expert Systems and Adaptive Refinement | Sidra Nasir et.al. | 2412.20468 | null |
2024-12-29 | Mind the Data Gap: Bridging LLMs to Enterprise Data Integration | Moe Kayali et.al. | 2412.20331 | null |
2025-03-09 | UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity | Jingbo Lin et.al. | 2412.20157 | link |
2024-12-28 | Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection | Yaning Zhang et.al. | 2412.20156 | null |
2025-02-18 | DeepSeek-V3 Technical Report | DeepSeek-AI et.al. | 2412.19437 | link |
2024-12-26 | AskChart: Universal Chart Understanding through Textual Enhancement | Xudong Yang et.al. | 2412.19146 | link |
2024-12-30 | Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection | Xiaoyu Huang et.al. | 2412.19108 | null |
2024-12-26 | DAPoinTr: Domain Adaptive Point Transformer for Point Cloud Completion | Yinghui Li et.al. | 2412.19062 | link |
2025-03-10 | Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making | David Shoresh et.al. | 2412.18593 | link |
2024-12-24 | BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing | Yingjie Ma et.al. | 2412.18065 | link |
2024-12-23 | UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition | Li Fu et.al. | 2412.17507 | null |
2025-02-01 | BrainMAP: Learning Multiple Activation Pathways in Brain Networks | Song Wang et.al. | 2412.17404 | link |
2024-12-23 | Efficient fine-tuning methodology of text embedding models for information retrieval: contrastive learning penalty (clp) | Jeongsu Yu et.al. | 2412.17364 | link |
2024-12-22 | The Fermat curves and arrangements of lines and conics | Nils Peder Astrup Toft et.al. | 2412.16993 | null |
2024-12-22 | Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models | Elie Antoine et.al. | 2412.16971 | null |
2024-12-18 | GraphLoRA: Empowering LLMs Fine-Tuning via Graph Collaboration of MoE | Ting Bai et.al. | 2412.16216 | null |
2024-12-20 | Theory of Mixture-of-Experts for Mobile Edge Computing | Hongbo Li et.al. | 2412.15690 | null |
2024-12-19 | MoEtion: Efficient and Reliable Checkpointing for Mixture-of-Experts Models at Scale | Swapnil Gandhi et.al. | 2412.15411 | null |
2025-01-03 | Qwen2.5 Technical Report | Qwen et.al. | 2412.15115 | link |
2025-02-27 | ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing | Ziteng Wang et.al. | 2412.14711 | link |
2025-01-22 | A Survey on Inference Optimization Techniques for Mixture of Experts Models | Jiacheng Liu et.al. | 2412.14219 | link |
2024-12-18 | SEKE: Specialised Experts for Keyword Extraction | Matej Martinc et.al. | 2412.14087 | link |
2024-12-18 | MedCoT: Medical Chain of Thought via Hierarchical Expert | Jiaxiang Liu et.al. | 2412.13736 | link |
2024-12-17 | SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks | Mátyás Vincze et.al. | 2412.13053 | null |
2024-12-17 | Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning | Moritz Reuss et.al. | 2412.12953 | null |
2025-01-09 | CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition | He Wang et.al. | 2412.12760 | null |
2024-12-16 | Investigating Mixture of Experts in Dense Retrieval | Effrosyni Sokli et.al. | 2412.11864 | null |
2024-12-20 | Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture | Jingze Shi et.al. | 2412.11834 | link |
2024-12-16 | Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation | Svetlana Pavlitska et.al. | 2412.11608 | link |
2024-12-16 | Enhancing Healthcare Recommendation Systems with a Multimodal LLMs-based MOE Architecture | Jingyu Xu et.al. | 2412.11557 | null |
2024-12-14 | DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification | Yuhao Wang et.al. | 2412.10650 | link |
2024-12-13 | DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding | Zhiyu Wu et.al. | 2412.10302 | link |
2024-12-13 | Llama 3 Meets MoE: Efficient Upcycling | Aditya Vavre et.al. | 2412.09952 | link |
2024-12-20 | Memory Layers at Scale | Vincent-Pierre Berges et.al. | 2412.09764 | link |
2025-01-10 | Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine | Xiaoshuang Huang et.al. | 2412.09278 | link |
2024-12-12 | MoSLD: An Extremely Parameter-Efficient Mixture-of-Shared LoRAs for Multi-Task Learning | Lulu Zhao et.al. | 2412.08946 | null |
2024-11-26 | Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection | Tzu-Ting Yang et.al. | 2412.08651 | null |
2025-01-18 | Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective | Minh Le et.al. | 2412.08285 | null |
2025-02-12 | Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node Classification | Xuanze Chen et.al. | 2412.08193 | link |
2024-12-10 | MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning | Yufei Ma et.al. | 2412.07405 | null |
2024-12-10 | Post-Training Statistical Calibration for Higher Activation Sparsity | Vui Seng Chua et.al. | 2412.07174 | link |
2025-03-02 | MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems | Yao Fu et.al. | 2412.07067 | null |
2024-12-07 | Partition of Unity Physics-Informed Neural Networks (POU-PINNs): An Unsupervised Framework for Physics-Informed Domain Decomposition and Mixtures of Experts | Arturo Rodriguez et.al. | 2412.06842 | null |
2024-12-09 | Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset | Xiao Wang et.al. | 2412.06647 | link |
2024-12-09 | UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts | Zhen Wan et.al. | 2412.06340 | null |
2024-12-08 | Hallucination-aware Optimization for Large Language Model-empowered Communications | Yinqiu Liu et.al. | 2412.06007 | link |
2024-12-10 | An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism | Qing Zhang et.al. | 2412.05821 | null |
2024-12-10 | RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts | Xu Liu et.al. | 2412.05679 | link |
2024-12-07 | SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts | Gengze Zhou et.al. | 2412.05552 | link |
2024-12-07 | Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers | Boxun Xu et.al. | 2412.05540 | null |
2024-12-23 | Steps are all you need: Rethinking STEM Education with Prompt Engineering | Krishnasai Addala et.al. | 2412.05023 | null |
2024-12-05 | Customize Segment Anything Model for Multi-Modal Semantic Segmentation with Mixture of LoRA Experts | Chenyang Zhu et.al. | 2412.04220 | null |
2025-03-02 | Monet: Mixture of Monosemantic Experts for Transformers | Jungwoo Park et.al. | 2412.04139 | link |
2024-12-05 | Meta-Reinforcement Learning With Mixture of Experts for Generalizable Multi Access in Heterogeneous Wireless Networks | Zhaoyang Liu et.al. | 2412.03850 | null |
2024-12-04 | Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond | Loukas Ilias et.al. | 2412.03483 | null |
2024-12-03 | CA-MoE: Channel-Adapted MoE for Incremental Weather Forecasting | Hao Chen et.al. | 2412.02503 | null |
2025-02-14 | MQFL-FHE: Multimodal Quantum Federated Learning Framework with Fully Homomorphic Encryption | Siddhant Dutta et.al. | 2412.01858 | null |
2025-01-22 | Yi-Lightning Technical Report | Alan Wake et.al. | 2412.01253 | null |
2024-11-30 | Mixture of Experts for Node Classification | Yu Shi et.al. | 2412.00418 | null |
2025-01-22 | HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting | Shaohan Yu et.al. | 2412.00316 | null |
2024-11-27 | Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference | Andrii Skliar et.al. | 2412.00099 | null |
2025-02-16 | Condense, Don’t Just Prune: Enhancing Efficiency and Performance in MoE Layer Pruning | Mingyu Cao et.al. | 2412.00069 | link |
2024-11-29 | LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References | Shuguo Jiang et.al. | 2411.19758 | null |
2024-11-28 | On the effectiveness of discrete representations in sparse mixture of experts | Giang Do et.al. | 2411.19402 | null |
2024-11-28 | Bayesian Cluster Weighted Gaussian Models | Panagiotis Papastamoulis et.al. | 2411.18957 | link |
2024-11-27 | UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS | Haomin Zhuang et.al. | 2411.18797 | null |
2024-11-27 | Complexity Experts are Task-Discriminative Learners for Any Image Restoration | Eduard Zamfir et.al. | 2411.18466 | null |
2024-11-27 | Mixture of Experts in Image Classification: What’s the Sweet Spot? | Mathurin Videau et.al. | 2411.18322 | null |
2024-11-26 | $H^3$ Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs | Selim Furkan Tekin et.al. | 2411.17792 | link |
2024-11-26 | The Tempered Finite Element Method | Antoine Quiriny et.al. | 2411.17564 | null |
2024-11-25 | Staleness-Centric Optimizations for Efficient Diffusion MoE Inference | Jiajun Luo et.al. | 2411.16786 | null |
2024-11-29 | MH-MoE: Multi-Head Mixture-of-Experts | Shaohan Huang et.al. | 2411.16205 | null |
2024-11-25 | LDACP: Long-Delayed Ad Conversions Prediction Model for Bidding Strategy | Peng Cui et.al. | 2411.16095 | null |
2024-11-24 | Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution | Haiquan Wang et.al. | 2411.15871 | null |
2024-11-24 | LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training | Xiaoye Qu et.al. | 2411.15708 | link |
2024-11-23 | Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts | Qizhou Chen et.al. | 2411.15432 | null |
2024-11-23 | Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation | Fahao Chen et.al. | 2411.15419 | null |
2024-11-21 | Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning | Jiange Yang et.al. | 2411.14519 | null |
2024-11-20 | MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification | Yuxuan Chen et.al. | 2411.13004 | null |
2024-11-23 | KAAE: Numerical Reasoning for Knowledge Graphs via Knowledge-aware Attributes Learning | Ming Yin et.al. | 2411.12950 | null |
2025-02-06 | Ultra-Sparse Memory Network | Zihao Huang et.al. | 2411.12364 | null |
2025-01-28 | CNMBERT: A Model for Converting Hanyu Pinyin Abbreviations to Chinese Characters | Zishuo Feng et.al. | 2411.11770 | link |
2024-11-18 | MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs | Shiyi Cao et.al. | 2411.11217 | null |
2024-11-16 | Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts | Jinqiang Long et.al. | 2411.10669 | link |
2024-11-15 | Weakly-Supervised Multimodal Learning on MIMIC-CXR | Andrea Agostini et.al. | 2411.10356 | link |
2024-11-21 | Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-scale MoE Models | Wei Wang et.al. | 2411.10003 | null |
2024-11-13 | Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection | Vima Gupta et.al. | 2411.08982 | null |
2024-11-13 | Sparse Upcycling: Inference Inefficient Finetuning | Sasha Doubov et.al. | 2411.08968 | null |
2024-11-13 | LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing | Xiaonan Nie et.al. | 2411.08446 | null |
2024-11-12 | Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach | Renzi Wang et.al. | 2411.08232 | null |
2024-11-12 | PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model | Yilun Liu et.al. | 2411.08212 | null |
2024-11-08 | Biodynamic Analysis of Alpine Skiing with a Skier-Ski-Snow Interaction Model | Nan Gao et.al. | 2411.08056 | null |
2024-11-12 | Towards Vision Mixture of Experts for Wildlife Monitoring on the Edge | Emmanuel Azuh Mensah et.al. | 2411.07834 | null |
2024-11-11 | Adaptive Conditional Expert Selection Network for Multi-domain Recommendation | Kuiyao Dong et.al. | 2411.06826 | null |
2024-11-11 | WDMoE: Wireless Distributed Mixture of Experts for Large Language Models | Nan Xue et.al. | 2411.06681 | null |
2024-11-09 | Learning Mixtures of Experts with EM | Quentin Fruytier et.al. | 2411.06056 | null |
2024-11-08 | NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts | Yen-Ting Lin et.al. | 2411.05945 | null |
2024-11-05 | DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts | Zelin Yao et.al. | 2411.03025 | link |
2024-11-05 | Advancing Robust Underwater Acoustic Target Recognition through Multi-task Learning and Multi-Gate Mixture-of-Experts | Yuan Xie et.al. | 2411.02787 | null |
2024-11-27 | SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models | Jianyi Zhang et.al. | 2411.02433 | link |
2024-11-06 | Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent | Xingwu Sun et.al. | 2411.02265 | null |
2024-12-27 | FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation | Ziwei Zhan et.al. | 2411.02115 | null |
2024-11-06 | Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis | Mohammad Zbeeb et.al. | 2411.01929 | link |
2025-02-10 | RS-MoE: A Vision-Language Model with Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering | Hui Lin et.al. | 2411.01595 | null |
2025-02-10 | Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation | Mingrui Liu et.al. | 2411.01457 | null |
2024-11-06 | HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference | Peng Tang et.al. | 2411.01433 | null |
2024-12-12 | HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy | Shuqing Luo et.al. | 2411.01288 | link |
2024-11-02 | PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment | Dongxu Liu et.al. | 2411.01245 | null |
2024-11-01 | MoE-I $^2$ : Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition | Cheng Yang et.al. | 2411.01016 | null |
2024-11-01 | LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models | Nam V. Nguyen et.al. | 2411.00918 | link |
2024-10-16 | TradExpert: Revolutionizing Trading with Mixture of Expert LLMs | Qianggang Ding et.al. | 2411.00782 | null |
2024-11-01 | MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization | Jingming Guo et.al. | 2411.00662 | link |
2024-11-01 | A Fast, Analytic Empirical Model of the Gaia Data Release 3 Astrometric Orbit Catalog Selection Function | Casey Y. Lam et.al. | 2411.00654 | link |
2024-10-31 | Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts | Xiang Deng et.al. | 2410.23836 | null |
2024-10-30 | Efficient and Interpretable Grammatical Error Correction with Mixture of Experts | Muhammad Reza Qorib et.al. | 2410.23507 | link |
2024-10-30 | Stealing User Prompts from Mixture of Experts | Itay Yona et.al. | 2410.22884 | null |
2024-10-30 | MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning | Xujia Wang et.al. | 2410.22782 | null |
2025-02-08 | ProMoE: Fast MoE-based LLM Serving using Proactive Caching | Xiaoniu Song et.al. | 2410.22134 | null |
2024-10-29 | Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging | Li Shen et.al. | 2410.21804 | null |
2024-10-29 | Neural Experts: Mixture of Experts for Implicit Neural Representations | Yizhak Ben-Shabat et.al. | 2410.21643 | null |
2024-11-07 | FinTeamExperts: Role Specialized MOEs For Financial Analysis | Yue Yu et.al. | 2410.21338 | null |
2024-10-28 | Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving | Jiyao Wang et.al. | 2410.21086 | null |
2024-10-27 | Towards a Blockchain and Opportunistic Edge Driven Metaverse of Everything | Paula Fraga-Lamas et.al. | 2410.20594 | null |
2024-10-27 | Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation | Maohao Shen et.al. | 2410.20336 | null |
2024-10-27 | GUMBEL-NERF: Representing Unseen Objects as Part-Compositional Neural Radiance Fields | Yusuke Sekikawa et.al. | 2410.20306 | null |
2024-11-12 | LLMs Can Evolve Continually on Modality for X-Modal Reasoning | Jiazuo Yu et.al. | 2410.20178 | link |
2024-10-25 | DMT-HI: MOE-based Hyperbolic Interpretable Deep Manifold Transformation for Unspervised Dimensionality Reduction | Zelin Zang et.al. | 2410.19504 | link |
2025-01-27 | Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis | Weikai Li et.al. | 2410.19225 | link |
2024-10-24 | Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design | Ruisi Cai et.al. | 2410.19123 | link |
2024-10-24 | Mixture of Parrots: Experts improve memorization more than reasoning | Samy Jelassi et.al. | 2410.19034 | null |
2024-10-24 | MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases | Zhisheng Lin et.al. | 2410.18406 | null |
2024-10-23 | Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches | Kexin Feng et.al. | 2410.18298 | null |
2024-10-23 | MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning | Jingfan Zhang et.al. | 2410.18035 | null |
2024-10-23 | ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference | Xin He et.al. | 2410.17954 | null |
2024-10-23 | Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition | Artem Basharin et.al. | 2410.17765 | null |
2024-10-22 | Optimizing Mixture-of-Experts Inference Time Combining Model Deployment and Communication Scheduling | Jialong Li et.al. | 2410.17043 | null |
2024-10-21 | LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset | Ruikun Zhang et.al. | 2410.16095 | link |
2024-10-22 | CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts | Zhenpeng Su et.al. | 2410.16077 | link |
2024-10-29 | Generalizing Motion Planners with Mixture of Experts for Autonomous Driving | Qiao Sun et.al. | 2410.15774 | link |
2024-11-23 | ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts | Xumeng Han et.al. | 2410.15732 | null |
2024-10-20 | Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs | Xin Zhou et.al. | 2410.15438 | null |
2024-11-16 | LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration | Yuang Ai et.al. | 2410.15385 | link |
2024-10-19 | MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning | Suning Huang et.al. | 2410.14972 | null |
2024-10-29 | Collaboratively adding new knowledge to an LLM | Rhui Dih Lee et.al. | 2410.14753 | link |
2024-10-18 | MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts | Rachel S. Y. Teo et.al. | 2410.14574 | link |
2024-10-18 | Towards a Simple and Extensible Standard for Object-Centric Event Data (OCED) – Core Model, Design Space, and Lessons Learned | Dirk Fahland et.al. | 2410.14495 | link |
2024-10-18 | ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mobility Prediction | Haoyu He et.al. | 2410.14099 | link |
2024-10-17 | Enhancing Generalization in Sparse Mixture of Experts Models: The Case for Increased Expert Activation in Compositional Tasks | Jinze Zhao et.al. | 2410.13964 | null |
2024-10-18 | MoR: Mixture of Ranks for Low-Rank Adaptation Tuning | Chuanyu Tang et.al. | 2410.13408 | null |
2024-10-16 | Satellite-Terrestrial Quantum Networks and the Global Quantum Internet | Andrea Conti et.al. | 2410.13096 | null |
2024-10-16 | On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs | Herun Wan et.al. | 2410.12600 | null |
2024-10-16 | Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion | Minkyoung Cho et.al. | 2410.12592 | null |
2024-10-16 | Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Experts | Fanqi Yan et.al. | 2410.12258 | null |
2025-01-03 | EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference | Yulei Qian et.al. | 2410.12247 | null |
2024-10-15 | MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router | Yanyue Xie et.al. | 2410.12013 | null |
2024-10-15 | MoH: Multi-Head Attention as Mixture-of-Head Attention | Peng Jin et.al. | 2410.11842 | link |
2024-10-15 | GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation | Fei Tang et.al. | 2410.11841 | link |
2024-10-15 | Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models | James Vo et.al. | 2410.11654 | null |
2024-10-16 | Quadratic Gating Functions in Mixture of Experts: A Statistical Insight | Pedram Akbarian et.al. | 2410.11222 | null |
2024-10-19 | AT-MoE: Adaptive Task-planning Mixture of Experts via LoRA Approach | Xurui Li et.al. | 2410.10896 | null |
2024-10-01 | Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models | Keivan Alizadeh et.al. | 2410.10846 | null |
2024-10-16 | Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free | Ziyue Li et.al. | 2410.10814 | link |
2024-10-14 | Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts | Guorui Zheng et.al. | 2410.10626 | link |
2024-10-14 | Learning to Ground VLMs without Forgetting | Aritra Bhowmik et.al. | 2410.10491 | null |
2024-10-14 | Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts | Xu Liu et.al. | 2410.10469 | null |
2024-10-15 | Ada-K Routing: Boosting the Efficiency of MoE-based LLMs | Tongtian Yue et.al. | 2410.10456 | null |
2024-10-14 | Tighter Risk Bounds for Mixtures of Experts | Wissam Akretche et.al. | 2410.10397 | null |
2024-10-24 | Scalable Multi-Domain Adaptation of Language Models using Modular Experts | Peter Schafhalter et.al. | 2410.10181 | null |
2024-10-16 | Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models | Jun Luo et.al. | 2410.10114 | null |
2024-10-14 | AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality | Peijun Qing et.al. | 2410.10054 | link |
2024-10-13 | ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL | Zhanqiu Guo et.al. | 2410.09781 | null |
2024-10-13 | MoIN: Mixture of Introvert Experts to Upcycle an LLM | Ajinkya Tejankar et.al. | 2410.09687 | null |
2024-10-12 | GETS: Ensemble Temperature Scaling for Calibration in Graph Neural Networks | Dingyi Zhuang et.al. | 2410.09570 | null |
2024-10-11 | Semi-Supervised Learning of Noisy Mixture of Experts Models | Oh-Ran Kwon et.al. | 2410.09039 | null |
2024-10-11 | Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering | I-Chun Chen et.al. | 2410.08589 | null |
2024-10-31 | Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts | Sukwon Yun et.al. | 2410.08245 | link |
2024-11-20 | Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training | Gen Luo et.al. | 2410.08202 | null |
2024-10-10 | Efficient Dictionary Learning with Switch Sparse Autoencoders | Anish Mudide et.al. | 2410.08201 | link |
2024-10-18 | More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing | Sagi Shaier et.al. | 2410.08003 | null |
2024-10-10 | SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture | Jiayi Han et.al. | 2410.07739 | null |
2024-10-10 | Upcycling Large Language Models into Mixture of Experts | Ethan He et.al. | 2410.07524 | null |
2024-10-09 | User Feedback in Continuous Software Engineering: Revealing the State-of-Practice | Anastasiia Tkalich et.al. | 2410.07459 | null |
2024-10-09 | MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts | Peng Jin et.al. | 2410.07348 | link |
2024-10-04 | A Dynamic Approach to Stock Price Prediction: Comparing RNN and Mixture of Experts Models Across Different Volatility Profiles | Diego Vallarino et.al. | 2410.07234 | null |
2024-10-09 | Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders | David Noever et.al. | 2410.06462 | null |
2024-10-09 | Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs | Ruijia Niu et.al. | 2410.06431 | null |
2024-10-08 | Probing the Robustness of Theory of Mind in Large Language Models | Christian Nickel et.al. | 2410.06271 | null |
2024-10-08 | MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More | Wei Huang et.al. | 2410.06270 | link |
2024-12-17 | Aria: An Open Multimodal Native Mixture-of-Experts Model | Dongxu Li et.al. | 2410.05993 | link |
2024-10-08 | Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models | Siqi Wang et.al. | 2410.05661 | null |
2024-12-05 | Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild | Xinyu Zhao et.al. | 2410.05357 | link |
2024-10-07 | Multimodal Fusion Strategies for Mapping Biophysical Landscape Features | Lucia Gordon et.al. | 2410.04833 | link |
2024-10-06 | Realizing Video Summarization from the Path of Language-based Semantic Understanding | Kuan-Chen Mu et.al. | 2410.04511 | null |
2024-10-09 | Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding | Wei Wu et.al. | 2410.03553 | null |
2024-10-04 | Exploring the Benefit of Activation Sparsity in Pre-training | Zhengyan Zhang et.al. | 2410.03440 | link |
2024-10-03 | MLP-KAN: Unifying Deep Representation and Function Learning | Yunhong He et.al. | 2410.03027 | link |
2024-10-03 | On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions | Huy Nguyen et.al. | 2410.02935 | null |
2024-10-03 | Neutral residues: revisiting adapters for model extension | Franck Signe Talla et.al. | 2410.02744 | null |
2024-10-03 | Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping | Ziye Huang et.al. | 2410.02475 | null |
2024-10-03 | MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction | Zhaojian Yu et.al. | 2410.02241 | null |
2024-10-03 | Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts | Minh Le et.al. | 2410.02200 | null |
2024-10-04 | Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices | Andres Potapczynski et.al. | 2410.02117 | link |
2024-10-04 | EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing | Haotian Sun et.al. | 2410.02098 | null |
2024-10-02 | Don’t flatten, tokenize! Unlocking the key to SoftMoE’s efficacy in deep RL | Ghada Sokar et.al. | 2410.01930 | null |
2024-09-15 | Integrating AI’s Carbon Footprint into Risk Management Frameworks: Strategies and Tools for Sustainable Compliance in Banking Sector | Nataliya Tkachenko et.al. | 2410.01818 | null |
2024-10-02 | Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models | Shayekh Bin Islam et.al. | 2410.01782 | link |
2024-10-02 | TIC 290061484: A Triply Eclipsing Triple System with the Shortest Known Outer Period of 24.5 Days | Veselin B. Kostov et.al. | 2410.01711 | null |
2024-10-02 | Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging | Tingfeng Hui et.al. | 2410.01610 | null |
2024-10-02 | The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs | Hong Li et.al. | 2410.01417 | null |
2024-10-01 | MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards | Sheng Wang et.al. | 2410.00938 | null |
2024-10-01 | UniAdapt: A Universal Adapter for Knowledge Calibration | Tai D. Nguyen et.al. | 2410.00454 | null |
2024-10-01 | Robust Traffic Forecasting against Spatial Shift over Years | Hongjun Wang et.al. | 2410.00373 | link |
2024-09-29 | IDEA: An Inverse Domain Expert Adaptation Based Active DNN IP Protection Method | Chaohui Xu et.al. | 2410.00059 | null |
2024-09-30 | MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning | Haotian Zhang et.al. | 2409.20566 | null |
2024-09-30 | HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models | Bingshen Mu et.al. | 2409.19878 | null |
2024-10-02 | CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling | Jihai Zhang et.al. | 2409.19291 | link |
2024-11-12 | SciDFM: A Large Language Model with Mixture-of-Experts for Science | Liangtai Sun et.al. | 2409.18412 | null |
2024-11-01 | Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE | Xun Zhu et.al. | 2409.17508 | link |
2024-09-26 | A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow Prediction | Guangyu Wang et.al. | 2409.17440 | link |
2024-09-24 | Leveraging Mixture of Experts for Improved Speech Deepfake Detection | Viola Negroni et.al. | 2409.16077 | null |
2024-10-02 | Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts | Xiaoming Shi et.al. | 2409.16040 | link |
2024-10-31 | Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM | Fengrun Zhang et.al. | 2409.15905 | null |
2024-09-24 | Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks | Jiayi He et.al. | 2409.15695 | null |
2024-12-13 | A Gated Residual Kolmogorov-Arnold Networks for Mixtures of Experts | Hugo Inzirillo et.al. | 2409.15161 | link |
2024-09-23 | Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond | Hong Chen et.al. | 2409.14993 | null |
2024-09-21 | Routing in Sparsely-gated Language Models responds to Context | Stefan Arnold et.al. | 2409.14107 | null |
2024-10-01 | On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists | Dongyang Fan et.al. | 2409.13931 | link |
2024-09-20 | Multi-omics data integration for early diagnosis of hepatocellular carcinoma (HCC) using machine learning | Annette Spooner et.al. | 2409.13791 | null |
2024-09-19 | On the rationality problem for hypersurfaces | Jan Lange et.al. | 2409.12834 | null |
2024-09-19 | Retrieval-Augmented Test Generation: How Far Are We? | Jiho Shin et.al. | 2409.12682 | null |
2024-09-19 | Robust Audiovisual Speech Recognition Models with Mixture-of-Experts | Yihan Wu et.al. | 2409.12370 | null |
2024-09-18 | Mixture of Diverse Size Experts | Manxi Sun et.al. | 2409.12210 | null |
2024-09-18 | GRIN: GRadient-INformed MoE | Liyuan Liu et.al. | 2409.12136 | null |
2024-09-18 | Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0 | Zhiyong Wang et.al. | 2409.11909 | null |
2024-09-17 | LPT++: Efficient Training on Mixture of Long-tailed Experts | Bowen Dong et.al. | 2409.11323 | null |
2024-12-09 | LOLA – An Open-Source Massively Multilingual Large Language Model | Nikit Srivastava et.al. | 2409.11272 | link |
2024-09-16 | Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression | Yi-Hsin Li et.al. | 2409.10101 | null |
2024-11-20 | MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving | Enming Zhang et.al. | 2409.07267 | link |
2024-09-10 | DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models | Maryam Akhavan Aghdam et.al. | 2409.06669 | null |
2024-09-10 | STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning | Jaeseong Lee et.al. | 2409.06211 | null |
2024-10-31 | VE: Modeling Multivariate Time Series Correlation with Variate Embedding | Shangjiong Wang et.al. | 2409.06169 | link |
2024-09-09 | Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models | Hongyang Lei et.al. | 2409.05929 | null |
2024-09-09 | Optical Spiking Neurons Enable High-Speed and Energy-Efficient Optical Neural Networks | Bo Xu et.al. | 2409.05726 | null |
2024-09-09 | Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection | Tianwu Lei et.al. | 2409.05611 | null |
2024-09-06 | Hot Stars in the GALEX Ultraviolet Sky Surveys (GUVcat_AISxSDSS_HS) and the Binary Fraction of Hot Evolved Stars | Luciana Bianchi et.al. | 2409.04626 | null |
2024-09-05 | Interpretable mixture of experts for time series prediction under recurrent and non-recurrent conditions | Zemian Ke et.al. | 2409.03282 | null |
2024-09-05 | ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding | Zhengzhuo Xu et.al. | 2409.03277 | null |
2024-09-05 | xLAM: A Family of Large Action Models to Empower AI Agent Systems | Jianguo Zhang et.al. | 2409.03215 | link |
2024-09-04 | Configurable Foundation Models: Building LLMs from a Modular Perspective | Chaojun Xiao et.al. | 2409.02877 | null |
2024-09-04 | Pluralistic Salient Object Detection | Xuelu Feng et.al. | 2409.02368 | null |
2024-09-03 | OLMoE: Open Mixture-of-Experts Language Models | Niklas Muennighoff et.al. | 2409.02060 | link |
2024-09-05 | Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model | Hukai Huang et.al. | 2409.02050 | null |
2024-09-03 | BEAVER: An Enterprise Benchmark for Text-to-SQL | Peter Baile Chen et.al. | 2409.02038 | null |
2024-09-03 | Laser: Parameter-Efficient LLM Bi-Tuning for Sequential Recommendation with Collaborative Information | Xinyu Zhang et.al. | 2409.01605 | null |
2024-09-02 | Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning | Soumajyoti Sarkar et.al. | 2409.01483 | null |
2024-09-02 | Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching | Sungmin Yun et.al. | 2409.01141 | null |
2024-09-04 | Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack | Guanzhong Chen et.al. | 2409.00960 | link |
2024-09-02 | Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts | Youngseog Chung et.al. | 2409.00879 | null |
2024-09-11 | Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts | Rhui Dih Lee et.al. | 2408.17280 | null |
2024-08-29 | Gradient-free variational learning with conditional mixture networks | Conor Heins et.al. | 2408.16429 | link |
2024-09-07 | Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models | Yuncheng Yang et.al. | 2408.15915 | link |
2024-08-28 | Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts | Nikolas Gritsch et.al. | 2408.15901 | null |
2024-10-23 | LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation | Fangxun Shu et.al. | 2408.15881 | link |
2024-08-28 | Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts | Lean Wang et.al. | 2408.15664 | null |
2024-08-27 | Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis | Sakhinana Sagar Srinivas et.al. | 2408.15305 | null |
2024-08-28 | A Survey of Large Language Models for European Languages | Wazir Ali et.al. | 2408.15040 | null |
2024-08-27 | MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce | Hao Jiang et.al. | 2408.14968 | null |
2024-08-24 | Advancing Enterprise Spatio-Temporal Forecasting Applications: Data Mining Meets Instruction Tuning of Language Models For Multi-modal Time Series Analysis in Low-Resource Settings | Sagar Srinivas Sakhinana et.al. | 2408.13622 | null |
2024-09-11 | Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler | Yikang Shen et.al. | 2408.13359 | null |
2024-10-30 | The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities | Venkatesh Balavadhani Parthasarathy et.al. | 2408.13296 | null |
2024-08-23 | Guiding IoT-Based Healthcare Alert Systems with Large Language Models | Yulan Gao et.al. | 2408.13071 | null |
2024-08-23 | O-Mamba: O-shape State-Space Model for Underwater Image Enhancement | Chenyu Dong et.al. | 2408.12816 | link |
2024-08-23 | DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation | Xiaowei Mao et.al. | 2408.12809 | null |
2024-08-23 | Multi-Treatment Multi-Task Uplift Modeling for Enhancing User Growth | Yuxiang Wei et.al. | 2408.12803 | null |
2024-08-23 | La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection | Hang Zou et.al. | 2408.12793 | null |
2024-10-02 | SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging | Mohammadreza Pourreza et.al. | 2408.12733 | null |
2024-08-22 | Jamba-1.5: Hybrid Transformer-Mamba Models at Scale | Jamba Team et.al. | 2408.12570 | null |
2024-09-09 | Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators | Dingkang Yang et.al. | 2408.12325 | null |
2024-08-15 | FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models | Zhongyu Zhao et.al. | 2408.11855 | link |
2024-08-21 | MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing | Hao Zhou et.al. | 2408.11396 | link |
2024-08-21 | KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting? | Xiao Han et.al. | 2408.11306 | link |
2024-08-21 | FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts | Hanzi Mei et.al. | 2408.11304 | null |
2024-08-27 | Unboxing Occupational Bias: Grounded Debiasing of LLMs with U.S. Labor Data | Atmika Gorti et.al. | 2408.11247 | null |
2024-08-25 | Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting | Jianxiang Zhou et.al. | 2408.10822 | link |
2024-08-20 | AnyGraph: Graph Foundation Model in the Wild | Lianghao Xia et.al. | 2408.10700 | link |
2024-08-20 | HMoE: Heterogeneous Mixture of Experts for Language Modeling | An Wang et.al. | 2408.10681 | null |
2024-08-19 | AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference | Shuzhang Zhong et.al. | 2408.10284 | link |
2024-10-29 | FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models | Xiaochen Wang et.al. | 2408.10276 | link |
2024-08-26 | SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models | Anke Tang et.al. | 2408.10174 | link |
2024-11-01 | Customizing Language Models with Instance-wise LoRA for Sequential Recommendation | Xiaoyu Kong et.al. | 2408.10159 | link |
2024-08-19 | A Unified Framework for Iris Anti-Spoofing: Introducing IrisGeneral Dataset and Masked-MoE Method | Hang Zou et.al. | 2408.09752 | null |
2024-08-16 | Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection | Haohao Zhu et.al. | 2408.08551 | null |
2024-08-17 | BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts | Qizhen Zhang et.al. | 2408.08274 | null |
Speculative Decoding
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-06-26 | Small Encoders Can Rival Large Decoders in Detecting Groundedness | Istabrak Abbes et.al. | 2506.21288 | null |
2025-06-26 | You never have enough J/ $ψ$ events: the case for a J/$ψ$ factory | Stephen Lars Olsen et.al. | 2506.20975 | null |
2025-06-17 | Utility-Driven Speculative Decoding for Mixture-of-Experts | Anish Saxena et.al. | 2506.20675 | null |
2025-06-26 | Charged rotating quantum black holes | Dyuman Bhattacharya et.al. | 2506.19941 | null |
2025-06-23 | Entangled Quantum Negative Energy Teleportation as a Probe of Semiclassical Gravity | Daniel S. Zachary et.al. | 2506.19878 | null |
2025-06-24 | Scaling Speculative Decoding with Lookahead Reasoning | Yichao Fu et.al. | 2506.19830 | null |
2025-06-23 | LLMs on a Budget? Say HOLA | Zohaib Hasan Siddiqui et.al. | 2506.18952 | null |
2025-06-23 | The Full Nonlinear Vortex Tube-Vorton Method: the post-stall condition | Jesus Carlos Pimentel-Garcia et.al. | 2506.18719 | null |
2025-06-17 | Semantic uncertainty in advanced decoding methods for LLM generation | Darius Foodeei et.al. | 2506.17296 | null |
2025-06-20 | Capturing Misalignment | Pierfrancesco Guarino et.al. | 2506.17176 | null |
2025-06-20 | ReasonGRM: Enhancing Generative Reward Models through Large Reasoning Models | Bin Chen et.al. | 2506.16712 | null |
2025-06-26 | Rethinking LLM Training through Information Geometry and Quantum Metrics | Riccardo Di Sipio et.al. | 2506.15830 | null |
2025-06-15 | $\texttt{SPECS}$ : Faster Test-Time Scaling through Speculative Drafts | Mert Cemri et.al. | 2506.15733 | null |
2025-06-18 | CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies | Donghyun Gouk et.al. | 2506.15601 | null |
2025-06-18 | PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction | Shufan Li et.al. | 2506.15556 | null |
2025-06-17 | Optimistic MEV in Ethereum Layer 2s: Why Blockspace Is Always in Demand | Ozan Solmaz et.al. | 2506.14768 | null |
2025-06-17 | S $^4$ C: Speculative Sampling with Syntactic and Semantic Coherence for Efficient Inference of Large Language Models | Tao He et.al. | 2506.14158 | null |
2025-06-16 | Stimulus Motion Perception Studies Imply Specific Neural Computations in Human Visual Stabilization | David W Arathorn et.al. | 2506.13506 | null |
2025-06-21 | Exploring the Secondary Risks of Large Language Models | Jiawei Chen et.al. | 2506.12382 | null |
2025-06-14 | Quantum Machine Learning | Muhammad Usman et.al. | 2506.12292 | null |
2025-06-13 | Fluid-induced snap-through instability of spherical shells | Pier Giuseppe Ledda et.al. | 2506.12247 | null |
2025-06-13 | Eliciting Reasoning in Language Models with Cognitive Tools | Brown Ebouky et.al. | 2506.12115 | null |
2025-06-12 | SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding | Ziyi Zhang et.al. | 2506.11309 | null |
2025-06-11 | Speculative Design in Spiraling Time: Methods and Indigenous HCI | James Eschrich et.al. | 2506.10229 | null |
2025-06-11 | V455 Car: an oscillating eclipsing Algol-type binary in triple star system | Zhao-Long Deng et.al. | 2506.10124 | null |
2025-06-11 | Patterns of Patterns III | Joseph Corneli et.al. | 2506.09696 | null |
2025-06-20 | SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving | Xiangchen Li et.al. | 2506.09397 | null |
2025-06-11 | A collection of results relating the geometry of plane domains and the exit time of planar Brownian motion, II | Greg Markowsky et.al. | 2506.09364 | null |
2025-06-10 | Draft-based Approximate Inference for LLMs | Kevin Galim et.al. | 2506.08373 | link |
2025-06-10 | Solving Convex-Concave Problems with $\tilde{\mathcal{O}}(ε^{-4/7})$ Second-Order Oracle Complexity | Lesi Chen et.al. | 2506.08362 | null |
2025-06-09 | MiniCPM4: Ultra-Efficient LLMs on End Devices | MiniCPM Team et.al. | 2506.07900 | link |
2025-06-09 | FREESS: An Educational Simulator of a RISC-V-Inspired Superscalar Processor Based on Tomasulo’s Algorithm | Roberto Giorgi et.al. | 2506.07665 | link |
2025-06-09 | LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments | Jin Huang et.al. | 2506.07416 | null |
2025-06-08 | Exploiting Inaccurate Branch History in Side-Channel Attacks | Yuhui Zhu et.al. | 2506.07263 | null |
2025-06-07 | Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit | Charles Goddard et.al. | 2506.06607 | null |
2025-06-06 | Fake Friends and Sponsored Ads: The Risks of Advertising in Conversational Search | Jacob Erickson et.al. | 2506.06447 | null |
2025-06-06 | Disk Evolution Study Through Imaging of Nearby Young Stars (DESTINYS): Evidence of planet-disk interaction in the 2MASSJ16120668-3010270 system | C. Ginski et.al. | 2506.05892 | null |
2025-06-10 | Gumbel-max List Sampling for Distribution Coupling with Multiple Samples | Joseph Rowan et.al. | 2506.05632 | null |
2025-06-05 | Accelerated Test-Time Scaling with Model-Free Speculative Sampling | Woomin Song et.al. | 2506.04708 | null |
2025-06-04 | Guided Speculative Inference for Efficient Test-Time Alignment of LLMs | Jonathan Geuter et.al. | 2506.04118 | link |
2025-06-04 | The Causal-Noncausal Tail Processes: An Introduction | Christian Gouriéroux et.al. | 2506.04046 | null |
2025-06-04 | AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism | Zhepei Wei et.al. | 2506.03700 | link |
2025-06-04 | POSS: Position Specialist Generates Better Draft for Speculative Decoding | Langlin Huang et.al. | 2506.03566 | link |
2025-06-02 | Out-of-Vocabulary Sampling Boosts Speculative Decoding | Nadav Timor et.al. | 2506.03206 | null |
2025-06-03 | Feedstack: Layering Structured Representations over Unstructured Feedback to Scaffold Human AI Conversation | Hannah Vy Nguyen et.al. | 2506.03052 | null |
2025-06-03 | Reuse or Generate? Accelerating Code Editing via Edit-Oriented Speculative Decoding | Peiding Wang et.al. | 2506.02780 | null |
2025-06-04 | Multi Layered Autonomy and AI Ecologies in Robotic Art Installations | Baoyang Chen et.al. | 2506.02606 | null |
2025-06-03 | Consultant Decoding: Yet Another Synergistic Mechanism | Chuanghao Ding et.al. | 2506.02391 | null |
2025-06-02 | Radiation GRMHD Models of Accretion onto Stellar-Mass Black Holes: I. Survey of Eddington Ratios | Lizhong Zhang et.al. | 2506.02289 | null |
2025-05-16 | SpecMemo: Speculative Decoding is in Your Pocket | Selin Yildirim et.al. | 2506.01986 | null |
2025-05-16 | Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism | Yuhao Shen et.al. | 2506.01979 | null |
2025-06-02 | Synchronic Web Digital Identity: Speculations on the Art of the Possible | Thien-Nam Dinh et.al. | 2506.01856 | null |
2025-06-02 | Playing with Transformer at 30+ FPS via Next-Frame Diffusion | Xinle Cheng et.al. | 2506.01380 | null |
2025-06-02 | Shape Shifting Light Dark Matter Solitons | Dor Ben-Amotz et.al. | 2506.01282 | null |
2025-06-01 | The $M_{\rm BH}-M_\star$ Relation of the hyperluminous Dust-obscured Quasars up to $z \sim 4$ | Yibin Luo et.al. | 2506.01218 | null |
2025-06-01 | Mamba Drafters for Speculative Decoding | Daewon Choi et.al. | 2506.01206 | null |
2025-06-01 | The Inverse Scaling Effect of Pre-Trained Language Model Surprisal Is Not Due to Data Leakage | Byung-Doh Oh et.al. | 2506.01172 | null |
2025-05-31 | Accelerating Diffusion LLMs via Adaptive Parallel Decoding | Daniel Israel et.al. | 2506.00413 | null |
2025-05-31 | Speculative Reward Model Boosts Decision Making Ability of LLMs Cost-Effectively | Jiawei Gu et.al. | 2506.00396 | link |
2025-05-30 | Cross-Attention Speculative Decoding | Wei Zhong et.al. | 2505.24544 | null |
2025-05-30 | CLaSp: In-Context Layer Skip for Self-Speculative Decoding | Longze Chen et.al. | 2505.24196 | null |
2025-06-10 | Ghidorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism | Jinhui Wei et.al. | 2505.23219 | null |
2025-05-28 | Pre-Training Curriculum for Multi-Token Prediction in Language Models | Ansar Aynetdinov et.al. | 2505.22757 | link |
2025-05-28 | Mass-feeding of jet-launching white dwarfs in grazing and common envelope evolution | Noam Soker et.al. | 2505.22621 | null |
2025-05-29 | Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design | Yudi Zhang et.al. | 2505.22179 | link |
2025-05-28 | RAD: Redundancy-Aware Distillation for Hybrid Models via Self-Speculative Decoding | Yuichiro Hoshino et.al. | 2505.22135 | null |
2025-05-28 | Robust and Symmetric Magnetic Field Dependency of Superconducting Diode Effect in Asymmetric Dirac Semimetal SQUIDs | H. C. Travaglini et.al. | 2505.21861 | null |
2025-05-27 | Computocene: Notes from an Age of Observation | Simone Severini et.al. | 2505.21744 | null |
2025-05-27 | Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits | Yeshwanth Venkatesha et.al. | 2505.21594 | null |
2025-05-27 | Hardware-Efficient Attention for Fast Decoding | Ted Zadouri et.al. | 2505.21487 | null |
2025-05-27 | Pair binding and Hund’s rule breaking in high-symmetry fullerenes | R. Rausch et.al. | 2505.21455 | null |
2025-05-28 | Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity | Yehui Tang et.al. | 2505.21411 | null |
2025-05-27 | Repeated Auctions with Speculators: Arbitrage Incentives and Forks in DAOs | Nicolas Eschenbaum et.al. | 2505.21296 | null |
2025-05-27 | SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences | Jungyoub Cha et.al. | 2505.20776 | link |
2025-05-27 | Replication of Reference-Dependent Preferences and the Risk-Return Trade-Off in the Chinese Market | Penggan Xu et.al. | 2505.20608 | null |
2025-05-26 | Academic Research Output Derivatives: Structuring Futures and Options on Research Output Index | Amarendra Sharma et.al. | 2505.20492 | null |
2025-05-26 | Bounded cohomology, quotient extensions, and hierarchical hyperbolicity | Francesco Fournier-Facio et.al. | 2505.20462 | null |
2025-05-26 | HAMburger: Accelerating LLM Inference via Token Smashing | Jingyu Liu et.al. | 2505.20438 | null |
2025-05-23 | Reinforcement Speculative Decoding for Fast Ranking | Yingpeng Du et.al. | 2505.20316 | null |
2025-06-13 | MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE | Zongle Huang et.al. | 2505.19645 | null |
2025-05-28 | Faster and Better LLMs via Latency-Aware Test-Time Scaling | Zili Wang et.al. | 2505.19634 | null |
2025-06-25 | Turing Test 2.0: The General Intelligence Threshold | Georgios Mappouras et.al. | 2505.19550 | null |
2025-05-29 | DREAM: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding | Yunhai Hu et.al. | 2505.19201 | link |
2025-05-25 | Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs | Xuan Zhang et.al. | 2505.19155 | null |
2025-05-24 | Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding | Yixuan Wang et.al. | 2505.18629 | null |
2025-05-23 | VeriThinker: Learning to Verify Makes Reasoning Model Efficient | Zigeng Chen et.al. | 2505.17941 | link |
2025-05-20 | Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency | Ruixiao Li et.al. | 2505.17074 | null |
2025-05-16 | SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs | Jinwoo Park et.al. | 2505.17052 | null |
2025-05-22 | KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization | Mingbo Song et.al. | 2505.16162 | null |
2025-05-21 | Strong Hilbert space fragmentation and fractons from subsystem and higher-form symmetries | Charles Stahl et.al. | 2505.15889 | null |
2025-05-21 | Quasinormal Modes of Schwarzschild Black Holes in the Dehnen-(1, 4, 5/2) Type Dark Matter Halos | Qi-Qi Liang et.al. | 2505.15540 | null |
2025-06-03 | Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding | Zijian Lin et.al. | 2505.15380 | null |
2025-05-21 | SSR: Speculative Parallel Scaling Reasoning in Test-time | Yuanlin Chu et.al. | 2505.15340 | null |
2025-05-21 | BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms | Yunlong Hou et.al. | 2505.15141 | null |
2025-05-20 | STree: Speculative Tree Decoding for Hybrid State-Space Models | Yangchao Wu et.al. | 2505.14969 | null |
2025-05-20 | On the Day They Experience: Awakening Self-Sovereign Experiential AI Agents | Botao Amber Hu et.al. | 2505.14893 | null |
2025-05-20 | Unremarkable to Remarkable AI Agent: Exploring Boundaries of Agent Intervention for Adults With and Without Cognitive Impairment | Mai Lee Chang et.al. | 2505.14872 | null |
2025-05-20 | X-ray properties of compact elliptical galaxies | Orsolya E. Kovacs et.al. | 2505.14768 | null |
2025-05-20 | Speculative Decoding Reimagined for Multimodal Large Language Models | Luxi Lin et.al. | 2505.14260 | link |
2025-05-19 | Language and Thought: The View from LLMs | Daniel Rothschild et.al. | 2505.13561 | null |
2025-05-19 | HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding | Siran Liu et.al. | 2505.13254 | null |
2025-05-19 | Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification | Jikai Wang et.al. | 2505.13204 | null |
2025-05-19 | FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference | Guangda Liu et.al. | 2505.13109 | null |
2025-05-25 | FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks | Zihua Wang et.al. | 2505.12728 | link |
2025-05-18 | Traversal Verification for Speculative Tree Decoding | Yepeng Weng et.al. | 2505.12398 | null |
2025-05-16 | FAIR Ecosystems for Science at Scale | Sean R. Wilkinson et.al. | 2505.11742 | null |
2025-05-16 | Prime Number Error Terms | Nathan Ng et.al. | 2505.11295 | null |
2025-05-16 | Beyond surfaces: quantifying internal radiative heat transport in dense materials | Janak Tiwari et.al. | 2505.10853 | null |
2025-05-16 | Qualia Optimization | Philip S. Thomas et.al. | 2505.10779 | null |
2025-05-15 | Anchoring AI Capabilities in Market Valuations: The Capability Realization Rate Model and Valuation Misalignment Risk | Xinmin Fang et.al. | 2505.10590 | null |
2025-05-18 | MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models | Mugilan Ganesan et.al. | 2505.10526 | null |
2025-05-21 | SpecOffload: Unlocking Latent GPU Capacity for LLM Inference on Resource-Constrained Devices | Xiangwen Zhuge et.al. | 2505.10259 | link |
2025-05-14 | Chandra Rules Out Super-Eddington Accretion For Little Red Dots | Andrea Sacchi et.al. | 2505.09669 | null |
2025-06-01 | Folded State Dynamics – A Geometric and Deterministic Origin of Irreversibility | Patrick BarAvi et.al. | 2505.09650 | null |
2025-05-14 | Observational study of the formation of homologous confined circular-ribbon flares | Shuhong Yang et.al. | 2505.09093 | null |
2025-05-13 | Long timescale numerical simulations of large, super-critical accretion discs | P. Chris Fragile et.al. | 2505.08859 | null |
2025-05-13 | Kudzu: Fast and Simple High-Throughput BFT | Victor Shoup et.al. | 2505.08771 | null |
2025-05-13 | Automatic Task Detection and Heterogeneous LLM Speculative Decoding | Danying Ge et.al. | 2505.08600 | null |
2025-05-12 | GUP Effective Metric Without GUP: Implications for the Sign of GUP Parameter and Quantum Bounce | Yen Chin Ong et.al. | 2505.07972 | null |
2025-05-12 | Localized Gravity, de Sitter, and the Horizon Criterion | Bjoern Friedrich et.al. | 2505.07934 | null |
2025-06-22 | TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking | Ching Nam Hang et.al. | 2505.07891 | null |
2025-05-08 | Scaling Laws for Speculative Decoding | Siyuan Yan et.al. | 2505.07858 | null |
2025-05-12 | SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models | Hang Wu et.al. | 2505.07680 | null |
2025-05-10 | N-body simulations of the Self-Confinement of Viscous Self-Gravitating Narrow Eccentric Planetary Ringlets | Joseph M. Hahn et.al. | 2505.06639 | null |
2025-05-09 | FastDup: a scalable duplicate marking tool using speculation-and-test mechanism | Zhonghai Zhang et.al. | 2505.06127 | link |
2025-05-08 | A Physics Model for Origin of Life | Paul Howard Frampton et.al. | 2505.05634 | null |
2025-05-08 | Memory Under Siege: A Comprehensive Survey of Side-Channel Attacks on Memory | MD Mahady Hassan et.al. | 2505.04896 | null |
2025-05-08 | Topological phase transition to a hidden charge density wave liquid | Joshua S. H. Lee et.al. | 2505.04867 | null |
2025-05-07 | SOAEsV2-7B/72B: Full-Pipeline Optimization for State-Owned Enterprise LLMs via Continual Pre-Training, Domain-Progressive SFT and Distillation-Enhanced Speculative Decoding | Jingyang Deng et.al. | 2505.04723 | null |
2025-05-06 | Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Autospeculation | Hengyuan Hu et.al. | 2505.03983 | null |
2025-05-06 | QiMeng-CPU-v2: Automated Superscalar Processor Design by Learning Data Dependencies | Shuyao Cheng et.al. | 2505.03195 | null |
2025-05-04 | The quest for explosive bubbles in the Indonesian Rupiah/US exchange rate: Does the uncertainty trinity matter? | Abdul Khaliq et.al. | 2505.02869 | null |
2025-05-24 | Accelerating Large Language Model Reasoning via Speculative Search | Zhihai Wang et.al. | 2505.02865 | null |
2025-05-21 | Dirac Singleton as a Relativistic Field Beyond Standard Model | M. A. Vasiliev et.al. | 2505.01915 | null |
2025-05-03 | Speculative Evolution Through 3D Cellular Automata | Amir Hossein Khazaei et.al. | 2505.01692 | null |
2025-05-02 | PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding | Bradley McDanel et.al. | 2505.01572 | null |
2025-05-12 | Emotions in Artificial Intelligence | Hermann Borotschnig et.al. | 2505.01462 | null |
2025-04-29 | X-ray Spectroscopy via Temporal Decomposition | William Setterberg et.al. | 2504.21169 | null |
2025-04-29 | Ground to Dust: Collisional Cascades and the Fate of Kardashev II Megaswarms | Brian C. Lacki et.al. | 2504.21151 | null |
2025-06-10 | EvoPort: An Evolutionary Framework for Portfolio Optimization via Randomized Alpha Discovery and Ensemble-Based Allocation | Nguyen Van Thanh et.al. | 2504.21095 | null |
2025-04-29 | Reviving Any-Subset Autoregressive Models with Principled Parallel Sampling and Speculative Decoding | Gabe Guo et.al. | 2504.20456 | link |
2025-04-28 | AutoJudge: Judge Decoding Without Manual Annotation | Roman Garipov et.al. | 2504.20039 | null |
2025-04-27 | Detecting speculative data flow vulnerabilities using weakest precondition reasoning | Graeme Smith et.al. | 2504.19128 | null |
2025-05-25 | Efficient Reasoning for LLMs through Speculative Chain-of-Thought | Jikai Wang et.al. | 2504.19095 | link |
2025-04-26 | Global Simulations of Gravitational Instability in Protostellar Disks with Full Radiation Transport II. Locality of Gravitoturbulence, Clumpy Spirals, and Implications for Observable Substructure | Wenrui Xu et.al. | 2504.18751 | null |
2025-06-15 | PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation | Zihao An et.al. | 2504.18583 | null |
2025-04-25 | Generalizing the relativistic precession model of quasi-periodic oscillations through anharmonic corrections | Roberto Giambò et.al. | 2504.18403 | null |
2025-04-23 | A Vision for AI-Driven Adaptation of Dynamic AR Content to Users and Environments | Julian Rasch et.al. | 2504.16562 | null |
2025-04-23 | Hardness of Median and Center in the Ulam Metric | Nick Fischer et.al. | 2504.16437 | null |
2025-04-22 | On commuting integer matrices | Jonathan Chapman et.al. | 2504.15839 | null |
2025-04-22 | Delayed Keen Model with Inflation | Ali Tolga Dincer et.al. | 2504.15819 | null |
2025-04-21 | Speculative Sampling via Exponential Races | Szymon Kobus et.al. | 2504.15475 | null |
2025-05-16 | Rendezvous in CAVITY: Kinematics and gas properties of an isolated dwarf-dwarf merging pair in a cosmic void region | Bahar Bidaran et.al. | 2504.15359 | null |
2025-04-21 | The phase diagram of CeRh ${2}$As${2}$ for out-of-plane magnetic field | P. Khanenko et.al. | 2504.15112 | null |
2025-04-21 | Safety Co-Option and Compromised National Security: The Self-Fulfilling Prophecy of Weakened AI Risk Thresholds | Heidy Khlaaf et.al. | 2504.15088 | null |
2025-04-21 | Note on Type $III_1$ Algebras in $ c= 1$ String Theory and Bulk Causal Diamonds | T. Banks et.al. | 2504.15076 | null |
2025-04-21 | Exploring Collaborative GenAI Agents in Synchronous Group Settings: Eliciting Team Perceptions and Design Considerations for the Future of Work | Janet G. Johnson et.al. | 2504.14779 | null |
2025-05-27 | BLACKOUT: Data-Oblivious Computation with Blinded Capabilities | Hossam ElAtali et.al. | 2504.14654 | null |
2025-04-25 | UFO2: The Desktop AgentOS | Chaoyun Zhang et.al. | 2504.14603 | link |
2025-04-20 | An interstellar mission to test astrophysical black holes | Cosimo Bambi et.al. | 2504.14576 | null |
2025-04-19 | Charge Densities in Crystals and Triply-Periodic Minimal Surfaces | Mengdi Yin et.al. | 2504.14148 | null |
2025-04-18 | Going Whole Hog: A Philosophical Defense of AI Cognition | Herman Cappelen et.al. | 2504.13988 | null |
2025-04-16 | From job titles to jawlines: Using context voids to study generative AI systems | Shahan Ali Memon et.al. | 2504.13947 | null |
2025-03-21 | Bio-crafting Architecture: Experiences of growing mycelium in minimal surface molds | Anca-Simona Horvath et.al. | 2504.13855 | null |
2025-05-28 | The Sky as a Killing Horizon | Níckolas de Aguiar Alves et.al. | 2504.12514 | null |
2025-04-12 | Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time | Wang Yang et.al. | 2504.12329 | link |
2025-04-18 | Large Language Model-Based Knowledge Graph System Construction for Sustainable Development Goals: An AI-Based Speculative Design Perspective | Yi-De Lin et.al. | 2504.12309 | null |
2025-04-16 | Purposefully Induced Psychosis (PIP): Embracing Hallucination as Imagination in Large Language Models | Kris Pilcher et.al. | 2504.12012 | null |
2025-04-16 | Who Said Only Military Officers Can Deal with Uncertainty? On the Importance of Uncertainty in EdTech Data Visualisations | Felicitas Macgilchrist et.al. | 2504.11974 | null |
2025-04-15 | Five dimensional rotating and Quintessence black hole and their shadows | Milko Estrada et.al. | 2504.11408 | null |
2025-04-16 | Efficient Distributed Retrieval-Augmented Generation for Enhancing Language Model Performance | Shangyu Liu et.al. | 2504.11197 | null |
2025-04-14 | Shield Bash: Abusing Defensive Coherence State Retrieval to Break Timing Obfuscation | Kartik Ramkrishnan et.al. | 2504.10318 | null |
2025-04-14 | Gravitational metamaterials from optical properties of spacetime media | Orlando Luongo et.al. | 2504.09987 | null |
2025-04-12 | Authoritarian Recursions: How Fiction, History, and AI Reinforce Control in Education, Warfare, and Discourse | Hasan Oguz et.al. | 2504.09030 | null |
2025-04-11 | SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting | Jiaming Xu et.al. | 2504.08850 | null |
2025-05-31 | SD $^2$ : Self-Distilled Sparse Drafters | Mike Lasby et.al. | 2504.08838 | null |
2025-04-05 | SLOs-Serve: Optimized Serving of Multi-SLO LLMs | Siyuan Chen et.al. | 2504.08784 | null |
2025-04-11 | Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices | Shengyuan Ye et.al. | 2504.08242 | null |
2025-05-16 | SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning | Rui Pan et.al. | 2504.07891 | link |
2025-04-10 | Synthetic Fluency: Hallucinations, Confabulations, and the Creation of Irish Words in LLM-Generated Translations | Sheila Castilho et.al. | 2504.07680 | null |
2025-04-10 | Proceedings of the Purposeful XR Workshop for CHI 2025 | Elizabeth Childs et.al. | 2504.07475 | null |
2025-04-09 | Joint Survey Processing. III. Compact Oddballs in the COSMOS Field – Little Red Dots and Transients | Yu-Heng Lin et.al. | 2504.07196 | null |
2025-04-09 | ShadowBinding: Realizing Effective Microarchitectures for In-Core Secure Speculation Schemes | Amund Bergland Kvalsvik et.al. | 2504.07018 | null |
2025-04-08 | SPIRe: Boosting LLM Inference Throughput with Speculative Decoding | Sanjit Neelam et.al. | 2504.06419 | null |
2025-04-08 | Decoding the Ishango Bone: Unveiling Prehistoric Mathematical Art | Jenny Baur et.al. | 2504.06412 | null |
2025-04-08 | Interplay between trimer structure and magnetic ground state in Ba5Ru3O12 probed by Neutron and muSR techniques | E. Kushwaha et.al. | 2504.06113 | null |
2025-04-08 | Strong Evidence That Abiogenesis Is a Rapid Process on Earth Analogs | David Kipping et.al. | 2504.05993 | null |
2025-04-08 | DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding | Hossein Entezari Zarch et.al. | 2504.05598 | null |
2025-06-03 | Speculative Automated Refactoring of Imperative Deep Learning Programs to Graph Execution | Raffi Khatchadourian et.al. | 2504.05424 | null |
2025-04-06 | pc-COP: An Efficient and Configurable 2048-p-Bit Fully-Connected Probabilistic Computing Accelerator for Combinatorial Optimization | Kiran Magar et.al. | 2504.04543 | null |
2025-06-02 | Representations of $p$ -adic groups and orbits with smooth closure in a variety of Langlands parameters | Kristaps Balodis et.al. | 2504.04163 | null |
2025-04-05 | PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models | Haofei Yin et.al. | 2504.04104 | null |
2025-03-23 | Agentic Business Process Management: The Past 30 Years And Practitioners’ Future Perspectives | Hoang Vu et.al. | 2504.03693 | null |
2025-04-04 | Ethics Readiness of Technology: The case for aligning ethical approaches with technological maturity | Eline de Jong et.al. | 2504.03336 | null |
2025-04-03 | A Review of Prototyping in XR: Linking Extended Reality to Digital Fabrication | Bixun Chen et.al. | 2504.02998 | null |
2025-05-02 | GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation | Zhiyuan Yan et.al. | 2504.02782 | link |
2025-04-03 | Black Holes, Moduli Stabilisation and the Swampland | Matilda Delgado et.al. | 2504.02645 | null |
2025-04-08 | Variational Online Mirror Descent for Robust Learning in Schrödinger Bridge | Dong-Sig Han et.al. | 2504.02618 | null |
2025-06-16 | Graviton Scattering on Gravitational Atoms: Relic Graviton Shot Noise | Benjamin Avila-Lopez et.al. | 2504.01286 | null |
2025-04-01 | Reminiscences about Steven Weinberg (This Time it’s Personal) | C. P. Burgess et.al. | 2504.01118 | null |
2025-04-01 | Mesoscale Eddy – Internal Wave Coupling. III. The End of the Enstrophy Cascade and Maintenance of Gyre Scale Potential Vorticity Gradients | Kurt L. Polzin et.al. | 2504.00486 | null |
2025-04-01 | The Impact of Triangular-Toothed Gears on the Functionality of the Antikythera Mechanism | Esteban Guillermo Szigety y Gustavo Francisco Arenas et.al. | 2504.00327 | null |
2025-06-04 | Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding | Aayush Gautam et.al. | 2504.00030 | null |
2025-03-31 | What the F*ck Is Artificial General Intelligence? | Michael Timothy Bennett et.al. | 2503.23923 | null |
2025-03-31 | A search for the three isomers of cyano-1,3-butadiene in TMC-1: Implications for bottom-up routes involving 1,3-butadiene | M. Agundez et.al. | 2503.23841 | null |
2025-03-30 | Credit, Land Speculation, and Low-Interest-Rate Policy | Tomohiro Hirano et.al. | 2503.23552 | null |
2025-03-30 | The Longest Duration SGRE Event in Solar Cycle 25 | Nat Gopalswamy et.al. | 2503.23544 | null |
2025-03-30 | Speculative End-Turn Detector for Efficient Speech Chatbot Assistant | Hyunjong Ok et.al. | 2503.23439 | null |
2025-03-29 | Beyond speculation: Measuring the growing presence of LLM-generated texts in multilingual disinformation | Dominik Macko et.al. | 2503.23242 | null |
2025-03-28 | Formation and Evolution of Compact Binaries Containing Intermediate Mass Black Holes in Dense Star Clusters` | Seungjae Lee et.al. | 2503.22109 | null |
2025-03-27 | How to Constrain the Stochastic Gravitational Wave Background with Multi-Frequency Detections | Eleanor Gleave et.al. | 2503.21508 | null |
2025-03-26 | Speculations on higher Fukaya categories | James Pascaleff et.al. | 2503.20906 | null |
2025-03-24 | The Centers and Margins of Modeling Humans in Well-being Technologies: A Decentering Approach | Jichen Zhu et.al. | 2503.19132 | null |
2025-05-14 | Spectropolarimetry of A Nuclear Transient AT2023clx: Revealing The Geometrical Alignment between The Transient Outflow and The Nuclear Dusty Region | Kohki Uno et.al. | 2503.19024 | null |
2025-03-23 | A Novel Hat-Shaped Device-Cloud Collaborative Inference Framework for Large Language Models | Zuan Xie et.al. | 2503.18989 | null |
2025-03-23 | A Multi-Model Adaptation of Speculative Decoding for Classification | Somnath Roy et.al. | 2503.18076 | null |
2025-03-20 | SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs | Shibo Jie et.al. | 2503.16163 | null |
2025-03-20 | “This could save us months of work” – Use Cases of AI and Automation Support in Investigative Journalism | Besjon Cifliku et.al. | 2503.16011 | null |
2025-03-20 | SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models | Fahao Chen et.al. | 2503.15921 | null |
2025-03-19 | Prada: Black-Box LLM Adaptation with Private Data on Resource-Constrained Devices | Ziyao Wang et.al. | 2503.14932 | null |
2025-06-12 | The Origin of the Very-High-Energy Diffuse $γ$ -Ray Emission: The Case for Galactic Source Cocoons | Antonio Ambrosone et.al. | 2503.14651 | null |
2025-05-04 | Superconductivity in magnetars: Exploring type-I and type-II states in toroidal magnetic fields | Mayusree Das et.al. | 2503.14594 | null |
2025-03-26 | Association of 220 PeV Neutrino KM3-230213A with Gamma-Ray Bursts | Ruiqi Wang et.al. | 2503.14471 | null |
2025-03-18 | Neutron portal to ultra-high-energy neutrinos | Gustavo F. S. Alves et.al. | 2503.14419 | null |
2025-03-18 | Speculative Decoding for Verilog: Speed and Quality, All in One | Changran Xu et.al. | 2503.14153 | null |
2025-03-18 | Growing a Twig to Accelerate Large Vision-Language Models | Zhenwei Shao et.al. | 2503.14075 | null |
2025-03-17 | ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts | Evangelos Georganas et.al. | 2503.13565 | null |
2025-03-17 | Enhanced anomalous Hall effect in the topological Kagome metal Cs(V $_{1-x}$Mn$_x$)$_3$Sb$_5$ | Xinmin Wang et.al. | 2503.13351 | null |
2025-03-28 | WOW: Workflow-Aware Data Movement and Task Scheduling for Dynamic Scientific Workflows | Fabian Lehmann et.al. | 2503.13072 | link |
2025-05-15 | Collaborative Speculative Inference for Efficient LLM Inference Serving | Luyao Gao et.al. | 2503.10325 | null |
2025-03-13 | Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding | Jinze Li et.al. | 2503.10135 | null |
2025-03-12 | A practical guide to machine learning interatomic potentials – Status and future | Ryan Jacobs et.al. | 2503.09814 | null |
2025-03-11 | In Search of the Potentially Hazardous Asteroids in the Taurid Resonant Swarm | Jasmine Li et.al. | 2503.08670 | null |
2025-03-11 | Liquidity Competition Between Brokers and an Informed Trader | Ryan Donnelly et.al. | 2503.08287 | null |
2025-03-25 | Training Domain Draft Models for Speculative Decoding: Best Practices and Insights | Fenglu Hong et.al. | 2503.07807 | null |
2025-03-10 | Did smartphones break the world as we knew it? | Mikhail V. Tamm et.al. | 2503.07773 | null |
2025-03-13 | Design as Hope: Reimagining Futures for Seemingly Doomed Problems | JaeWon Kim et.al. | 2503.07586 | null |
2025-03-09 | A parallel parser for regular expressions | Angelo Borsotti et.al. | 2503.06763 | null |
2025-03-07 | Quantum-like cognition and decision making in the light of quantum measurement theory | Miho Fuyama et.al. | 2503.05859 | null |
2025-02-25 | Generative Artificial Intelligence: Evolving Technology, Growing Societal Impact, and Opportunities for Information Systems Research | Veda C. Storey et.al. | 2503.05770 | null |
2025-03-07 | Speculative Decoding for Multi-Sample Inference | Yiwei Li et.al. | 2503.05330 | null |
2025-03-07 | SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding | Kaiyu Huang et.al. | 2503.05096 | null |
2025-02-11 | Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations | Kunal Handa et.al. | 2503.04761 | null |
2025-03-19 | Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling | Yan Li et.al. | 2503.04398 | null |
2025-03-06 | A possible jet and corona configuration for Swift J1727.8–1613 during the hard state | Jing-Qiang Peng et.al. | 2503.04044 | null |
2025-03-05 | RASD: Retrieval-Augmented Speculative Decoding | Guofeng Quan et.al. | 2503.03434 | null |
2025-03-26 | SpecInF: Exploiting Idle GPU Resources in Distributed DL Training via Speculative Inference Filling | Cunchi Lv et.al. | 2503.02550 | null |
2025-04-02 | Linear Representations of Political Perspective Emerge in Large Language Models | Junsol Kim et.al. | 2503.02080 | link |
2025-04-23 | EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test | Yuhui Li et.al. | 2503.01840 | link |
2025-03-03 | Efficient Long-Term Structural Reliability Estimation with Non-Gaussian Stochastic Models: A Design of Experiments Approach | Sebastian Winter et.al. | 2503.01566 | null |
2025-03-17 | MeshPad: Interactive Sketch-Conditioned Artist-Designed Mesh Generation and Editing | Haoxuan Li et.al. | 2503.01425 | null |
2025-03-24 | Turbulence in virtual: II. Origin of skewness and dual fraction processes | Xunchuan Liu et.al. | 2503.01160 | null |
2025-03-02 | DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting | Kai Lv et.al. | 2503.00784 | link |
2025-03-02 | Speculative Ad-hoc Querying | Haoyu Li et.al. | 2503.00714 | link |
2025-03-01 | Tutorial Proposal: Speculative Decoding for Efficient LLM Inference | Heming Xia et.al. | 2503.00491 | null |
2025-03-01 | Peek into the `White-Box’: A Field Study on Bystander Engagement with Urban Robot Uncertainty | Xinyan Yu et.al. | 2503.00337 | null |
2025-03-01 | Doraemon’s Gadget Lab: Unpacking Human Needs and Interaction Design in Speculative Technology | Tram Thi Minh Tran et.al. | 2503.00257 | null |
2025-02-28 | Broadband pulsed quadrature measurements with calorimeters | Ezad Shojaee et.al. | 2503.00188 | null |
2025-02-28 | AMuLeT: Automated Design-Time Testing of Secure Speculation Countermeasures | Bo Fu et.al. | 2503.00145 | link |
2025-02-28 | Assessment of universal relations among second-order moments of relativistic stars via reformulated perturbation equations | Koutarou Kyutoku et.al. | 2503.00098 | null |
2025-02-14 | A Short History of Rocks: or, How to Invent Quantum Computing | David Wakeham et.al. | 2503.00005 | null |
2025-05-13 | Nano Drone-based Indoor Crime Scene Analysis | Martin Cooney et.al. | 2502.21019 | null |
2025-03-04 | Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff | Maximilian Holsman et.al. | 2502.20704 | link |
2025-02-28 | MonadBFT: Fast, Responsive, Fork-Resistant Streamlined Consensus | Mohammad Mussadiq Jalalzai et.al. | 2502.20692 | null |
2025-03-24 | Turbulence in virtual: Origin of the variance and skewness of density function | Xunchuan Liu et.al. | 2502.20458 | null |
2025-02-27 | Long-Context Inference with Retrieval-Augmented Speculative Decoding | Guanzheng Chen et.al. | 2502.20330 | link |
2025-04-28 | Frobenius subalgebra lattices in tensor categories | Mainak Ghosh et.al. | 2502.19876 | null |
2025-03-04 | Speculative Decoding and Beyond: An In-Depth Survey of Techniques | Yunhai Hu et.al. | 2502.19732 | null |
2025-02-26 | From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens | Tong Wu et.al. | 2502.18890 | link |
2025-02-26 | Reimagining Personal Data: Unlocking the Potential of AI-Generated Images in Personal Data Meaning-Making | Soobin Park et.al. | 2502.18853 | null |
2025-02-26 | Towards Optimal Multi-draft Speculative Decoding | Zhengmian Hu et.al. | 2502.18779 | null |
2025-03-02 | Variability of Central Stars of Planetary Nebulae with the Zwicky Transient Facility. II. Long-Timescale Variables including Wide Binary and Late Thermal Pulse Candidates | Soumyadeep Bhattacharjee et.al. | 2502.18651 | null |
2025-02-27 | Kinematics of metallicity populations in Omega Centauri using Gaia Focused Product Release and Hubble Space Telescope | Nagaraj Vernekar et.al. | 2502.17755 | null |
2025-02-24 | Knowledge Distillation with Training Wheels | Guanlin Liu et.al. | 2502.17717 | null |
2025-02-24 | THOR: A Non-Speculative Value Dependent Timing Side Channel Attack Exploiting Intel AMX | Farshad Dizani et.al. | 2502.17658 | null |
2025-02-24 | LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification | Penghui Yang et.al. | 2502.17421 | link |
2025-02-24 | Defects in the $β$-Ga$_2$O$_3$($\bar201$)/HfO$_2$ MOS system and the effect of thermal treatments | Khushabu. S. Agrawal et.al. | 2502.17112 | null |
2025-05-25 | CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter | Yepeng Weng et.al. | 2502.16880 | null |
2025-02-24 | APINT: A Full-Stack Framework for Acceleration of Privacy-Preserving Inference of Transformers based on Garbled Circuits | Hyunjun Cho et.al. | 2502.16877 | null |
2025-04-03 | Towards Reinforcement Learning for Exploration of Speculative Execution Vulnerabilities | Evan Lai et.al. | 2502.16756 | null |
2025-02-22 | Fluctuating Lattice, Several Energy Scales | Holger Bech Nielsen et.al. | 2502.16369 | null |
2025-02-21 | DReSD: Dense Retrieval for Speculative Decoding | Milan Gritta et.al. | 2502.15572 | link |
2025-02-27 | PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System | Yintao He et.al. | 2502.15470 | null |
2025-02-24 | Ultra-high-energy $γ$ -ray emission associated with the tail of a bow-shock pulsar wind nebula | Zhen Cao et.al. | 2502.15447 | null |
2025-02-21 | TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding | Zhaoxuan Wu et.al. | 2502.15197 | null |
2025-02-21 | A Critical Examination of the Nested Leaky Box Model for Galactic Cosmic Ray Transport | Benedikt Schroer et.al. | 2502.15115 | null |
2025-03-11 | FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling | Weilin Zhao et.al. | 2502.14856 | null |
2025-05-07 | Fusion rules and structure constants of E-series minimal models | Rongvoram Nivesvivat et.al. | 2502.14295 | null |
2025-02-19 | Which Attention Heads Matter for In-Context Learning? | Kayo Yin et.al. | 2502.14010 | link |
2025-03-17 | NVR: Vector Runahead on NPUs for Sparse Memory Access | Hui Wang et.al. | 2502.13873 | null |
2025-02-19 | Hierarchical accretion flow from the G351 infrared dark filament to its central cores | H. Beuther et.al. | 2502.13866 | null |
2025-02-19 | C2T: A Classifier-Based Tree Construction Method in Speculative Decoding | Feiye Huo et.al. | 2502.13652 | null |
2025-02-19 | Near-extremal dumb holes and some aspects of the Hawking effect | Akshat Pandey et.al. | 2502.13557 | null |
2025-02-19 | Radio observations of the ultra-long GRB 220627A reveal a hot cocoon supporting the blue supergiant progenitor scenario | James K. Leung et.al. | 2502.13435 | null |
2025-02-18 | Inconsistent metallicity spreads in first generation stars of globular clusters from high resolution spectroscopy and HST photometry | Eugenio Carretta et.al. | 2502.13206 | null |
2025-02-17 | SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs | Yige Xu et.al. | 2502.12134 | null |
2025-02-16 | AI Generations: From AI 1.0 to AI 4.0 | Jiahao Wu et.al. | 2502.11312 | null |
2025-02-16 | Coherent Spin Pumping Originated from Sub-Terahertz Néel Vector Dynamics in Easy Plane α-Fe2O3/Pt | Gregory Fritjofson et.al. | 2502.11281 | null |
2025-02-16 | GRIFFIN: Effective Token Alignment for Faster Speculative Decoding | Shijing Hu et.al. | 2502.11018 | link |
2025-02-05 | QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache | Rishabh Tiwari et.al. | 2502.10424 | null |
2025-02-13 | Rosette Nebula Outburst Gaia 24djk from the Young Stellar Object V557 Mon | Adolfo S. Carvalho et.al. | 2502.09523 | null |
2025-02-13 | $^{18}$ F-FDG brain PET hypometabolism in post-SARS-CoV-2 infection: substrate for persistent/delayed disorders? | Eric Guedj et.al. | 2502.09077 | null |
2025-02-13 | CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality | Razvan-Gabriel Dumitru et.al. | 2502.08923 | link |
2025-03-19 | Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding | Ziyao Wang et.al. | 2502.08020 | null |
2025-04-13 | Regular Black Holes in Lovelock gravity with a Degenerate AdS Ground State and their shadows | Milko Estrada et.al. | 2502.07992 | null |
2025-03-06 | Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs | Ruichen Zhang et.al. | 2502.07942 | null |
2025-02-05 | Entropy Adaptive Decoding: Dynamic Model Switching for Efficient Inference | Toby Simonds et.al. | 2502.06833 | null |
2025-02-10 | Persistent spin grids with spin-orbit coupled 2D electron gas | A. V. Poshakinskiy et.al. | 2502.06745 | null |
2025-03-27 | LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models | Sihwan Park et.al. | 2502.06352 | link |
2025-02-10 | Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE | Haiduo Huang et.al. | 2502.06282 | link |
2025-02-08 | Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding | Sukmin Cho et.al. | 2502.05609 | link |
2025-01-31 | Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies | Nadav Timor et.al. | 2502.05202 | null |
2025-02-07 | Learning Universal Multi-level Market Irrationality Factors to Improve Stock Return Forecasting | Chen Yang et.al. | 2502.04737 | null |
2025-02-06 | Speeding up Speculative Decoding via Approximate Verification | Meiyu Zhong et.al. | 2502.04557 | null |
2025-02-06 | Gig2Gether: Data-sharing to Empower, Unify and Demystify Gig Work | Jane Hsieh et.al. | 2502.04482 | null |
2025-02-06 | The Evolution of Hypervelocity Supernova Survivors and the Outcomes of Interacting Double White Dwarf Binaries | Ken J. Shen et.al. | 2502.04451 | null |
2025-02-06 | Properties of the emission region in pulsars with opposite subpulse drift directions in different profile components | H. M. Tedila et.al. | 2502.03833 | null |
2025-02-05 | COSMOS-Web: The emergence of the Hubble Sequence | M. Huertas-Company et.al. | 2502.03532 | null |
2025-02-13 | FSLH: Flexible Mechanized Speculative Load Hardening | Roberto Blanco et.al. | 2502.03203 | null |
2025-02-05 | How probable is the Lyman- $α$ damping wing in the spectrum of the redshift z = 5.9896 quasar ULAS J0148+0600? | Fiona Sawyer et.al. | 2502.03085 | null |
2025-02-05 | A comprehensive study of the gas-phase formation network of HC $_5$ N: theory, experiments, observations and models | Lisa Giani et.al. | 2502.03046 | null |
2025-04-17 | The connection between high-redshift galaxies and Lyman $α$ transmission in the Sherwood-Relics simulations of patchy reionisation | Luke Conaboy et.al. | 2502.02983 | null |
2025-02-05 | Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation | Jingyu Liu et.al. | 2502.02789 | link |
2025-02-04 | EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization | Yize Wu et.al. | 2502.02493 | null |
2025-02-04 | M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference | Nikhil Bhendawade et.al. | 2502.02040 | null |
2025-02-03 | Cosmic Ray Feedback in Massive Halos: Implications for the Distribution of Baryons | Eliot Quataert et.al. | 2502.01753 | null |
2025-02-01 | Speculative Ensemble: Fast Large Language Model Ensemble via Speculation | Jiale Fu et.al. | 2502.01662 | link |
2025-02-03 | Time-dependent solutions of biadjoint scalar field theories | Kymani Armstrong-Williams et.al. | 2502.01294 | null |
2025-02-02 | Constructing AI ethics narratives based on real-world data: Human-AI collaboration in data-driven visual storytelling | Mengyi Wei et.al. | 2502.00637 | null |
2025-02-01 | Predicting the number density of heavy seed massive black holes due to an intense Lyman-Werner field | Hannah O’Brennan et.al. | 2502.00574 | null |
2025-02-04 | Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation | Yang Cao et.al. | 2502.00500 | null |
2025-02-14 | Reward-Guided Speculative Decoding for Efficient LLM Reasoning | Baohao Liao et.al. | 2501.19324 | null |
2025-01-31 | Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment | Gregor Bachmann et.al. | 2501.19309 | null |
2025-02-19 | Emancipatory Information Retrieval | Bhaskar Mitra et.al. | 2501.19241 | null |
2025-01-31 | Trading Inference-Time Compute for Adversarial Robustness | Wojciech Zaremba et.al. | 2501.18841 | null |
2025-01-30 | Human Re-ID Meets LVLMs: What can we expect? | Kailash Hambarde et.al. | 2501.18698 | null |
2025-01-28 | How Hamilton-Jacobi formalism helps to address the physical meaning of the wave function in Bohmian mechanics | Arnaud Amblard et.al. | 2501.16989 | null |
2025-03-04 | Distilling Large Language Models for Network Active Queue Management | Deol Satish et.al. | 2501.16734 | null |
2025-01-24 | The disrupting and growing open cluster spiral arm patterns of the Milky Way | Xiaochen Liu et.al. | 2501.14215 | null |
2025-01-19 | Hallucination Mitigation using Agentic AI Natural Language-Based Frameworks | Diego Gosmar et.al. | 2501.13946 | link |
2025-01-23 | Inflaton Self Resonance, Oscillons, and Gravitational Waves in Small Field Polynomial Inflation | Manuel Drees et.al. | 2501.13811 | null |
2025-01-23 | Considerations on the Origin of IRAS 19312+1950 Based on Long-Term Maser Observations | Huan-Xue Feng et.al. | 2501.13769 | null |
2025-01-23 | Compiler Support for Speculation in Decoupled Access/Execute Architectures | Robert Szafarczyk et.al. | 2501.13553 | null |
2025-02-01 | Concentration in Governance Control Across Decentralised Finance Protocols | Thomas Eisermann et.al. | 2501.13377 | link |
2025-01-22 | The outer structure of old star clusters in the Small Magellanic Cloud | Andrés E. Piatti et.al. | 2501.13062 | null |
2025-01-22 | Entanglement dynamics in collision models and entanglement quilts | Le Hu et.al. | 2501.12629 | null |
2025-01-22 | Link in $\mathbb{R}\mathbb{P}^3$ and the Topological Vertex | John Chae et.al. | 2501.12566 | null |
2025-01-21 | AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding | Zikun Li et.al. | 2501.12162 | null |
2025-01-20 | MIDIS: Quantifying the AGN component of X-ray-detected galaxies | Steven Gillman et.al. | 2501.11491 | null |
2025-01-23 | The JWST EXCELS survey: an extremely metal-poor galaxy at $z=8.271$ hosting an unusual population of massive stars | F. Cullen et.al. | 2501.11099 | null |
2025-01-30 | Vortices for lake equations (review with questions and speculations) | Jair Koiller et.al. | 2501.10433 | null |
2025-01-17 | From strong to weak correlations in breathing-mode kagome van der Waals materials: Nb $_3$(F,Cl,Br,I)$_8$ as a robust and versatile platform for many-body engineering | Joost Aretz et.al. | 2501.10320 | null |
2025-01-16 | 25 years of XMM-Newton observations of the Sgr A complex: 3D distribution and internal structure of the clouds | G. Stel et.al. | 2501.09737 | null |
2025-01-16 | Weak electronic correlations in the cobalt oxychalcogenide superconductor Na2CoSe2O | Zhenchao Wu et.al. | 2501.09675 | null |
2025-02-11 | Anatomy of a Digital Bubble: Lessons Learned from the NFT and Metaverse Frenzy | Daisuke Kawai et.al. | 2501.09601 | null |
2025-01-16 | A universal break in energy functions of three hyperactive repeating fast radio bursts | Q. Wu et.al. | 2501.09248 | null |
2025-01-15 | The emission of interpulses by a 6.45-hour period coherent radio transient | Y. W. J. Lee et.al. | 2501.09133 | null |
2025-01-13 | Cassiopeia A’s Reverse Shock and its Effects on the Expanding SN Ejecta | Robert A. Fesen et.al. | 2501.07708 | null |
2025-01-11 | Is the Monetary Transmission Mechanism Broken? Time for People’s Quantitative Easing | Sebastian Dragoe et.al. | 2501.06575 | null |
2025-01-27 | QPEs as Lense-Thirring precession of super-Eddington flows | M. Middleton et.al. | 2501.06185 | link |
2025-01-10 | Analysing the coverage of the University of Bologna’s publication metadata in an existing source of open research information | Erica Andreose et.al. | 2501.05821 | null |
2025-01-09 | Accelerated Diffusion Models via Speculative Sampling | Valentin De Bortoli et.al. | 2501.05370 | null |
2025-01-09 | The CO-Fuelled Time Machine: Tracing Birth Conditions and Terrestrial Planet Formation Outcomes in HD 163296 through Pebble Drift-induced CO Enhancements | Joe Williams et.al. | 2501.05316 | null |
2025-01-09 | Observational Study of the Atmospheric Gravity Waves in the lower Solar Atmosphere | Ravi Chaurasiya et.al. | 2501.05042 | null |
2025-01-07 | Transparent Decompilation for Timing Side-Channel Analyses | Santiago Arranz Olmos et.al. | 2501.04183 | null |
2025-01-07 | Spin Environment of a Superconducting Qubit in High Magnetic Fields | S. Günzler et.al. | 2501.03661 | null |
2025-01-07 | Neural Cellular Automata and Deep Equilibrium Models | Zhibai Jia et.al. | 2501.03573 | null |
2025-01-07 | CI at Scale: Lean, Green, and Fast | Dhruva Juloori et.al. | 2501.03440 | null |
2025-01-02 | Vertex algebras, topological defects, and Moonshine | Roberto Volpato et.al. | 2412.21141 | null |
2024-12-30 | Strategic Learning and Trading in Broker-Mediated Markets | Alif Aqsha et.al. | 2412.20847 | null |
2024-12-28 | From Worms to Mice: Homeostasis Maybe All You Need | Jesus Marco de Lucas et.al. | 2412.20090 | null |
2025-01-13 | HADES: Hardware Accelerated Decoding for Efficient Speculation in Large Language Models | Ze Yang et.al. | 2412.19925 | null |
2024-12-27 | Cosmohedra | Nima Arkani-Hamed et.al. | 2412.19881 | null |
2024-12-27 | Paleoinspired Vision: From Exploring Colour Vision Evolution to Inspiring Camera Design | Junjie Zhang et.al. | 2412.19439 | null |
2024-12-25 | Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference | Libo Zhang et.al. | 2412.18934 | null |
2024-12-25 | AdaEAGLE: Optimizing Speculative Decoding via Explicit Modeling of Adaptive Draft Structures | Situo Zhang et.al. | 2412.18910 | null |
2024-12-23 | The Unique Helium Nova V445 Puppis Ejected $\gg$0.001 M$_{\odot}$ in the Year 2000 and Will Not Become a Type Ia Supernova | Bradley E. Schaefer et.al. | 2412.17286 | null |
2024-12-20 | Gravitational Observatories in AdS $_4$ | Dionysios Anninos et.al. | 2412.16305 | null |
2024-12-20 | Two-Part Interplanetary Type II Solar Radio Bursts | Silja Pohjolainen et.al. | 2412.15961 | null |
2025-01-10 | Minimizing speculation overhead in a parallel recognizer for regular texts | Angelo Borsotti et.al. | 2412.14975 | null |
2025-01-13 | $\mathcal{N}=2$ superconformal gravitino in harmonic superspace | Evgeny Ivanov et.al. | 2412.14822 | null |
2025-02-07 | The JWST/NIRSpec view of the nuclear region in the prototypical merging galaxy NGC 6240 | Matteo Ceci et.al. | 2412.14685 | null |
2024-12-18 | Fermion-Portal Dark Matter at a High-Energy Muon Collider | Pouya Asadi et.al. | 2412.14235 | null |
2024-12-18 | Current and secular accretion rates of EX Hydrae | K. Beuermann et.al. | 2412.13850 | null |
2024-12-18 | Fool’s gold: ligand-receptor interactions and the origins of life | Betony Adams et.al. | 2412.13836 | null |
2024-12-18 | Diffusion models and stochastic quantisation in lattice field theory | Gert Aarts et.al. | 2412.13704 | null |
2024-12-17 | Distributed Speculative Execution for Resilient Cloud Applications | Tianyu Li et.al. | 2412.13314 | null |
2024-12-17 | Where do X-ray low surface brightness clusters sit with respect to filaments? | S. Zarattini et.al. | 2412.13258 | null |
2024-12-17 | Agnosticism About Artificial Consciousness | Tom McClelland et.al. | 2412.13145 | null |
2024-12-17 | Insight into the Starburst Nature of Galaxy GN-z11 with JWST MIRI Spectroscopy | J. Álvarez-Márquez et.al. | 2412.12826 | null |
2025-03-18 | Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models | Seungeun Oh et.al. | 2412.12687 | null |
2024-12-26 | Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree | Xiangxiang Gao et.al. | 2412.12639 | null |
2024-12-15 | Heat kernel and local index theorem for open complex manifolds with $\mathbb{C}^{\ast }$ -action | Jih-Hsin Cheng et.al. | 2412.11037 | null |
2024-12-14 | The JWST-NIRCam View of Sagittarius C. II. Evidence for Magnetically Dominated HII Regions in the CMZ | John Bally et.al. | 2412.10983 | null |
2025-02-23 | Interference in Fuzzy Dark Matter Filaments: Idealised Models and Statistics | Tim Zimmermann et.al. | 2412.10829 | null |
2025-02-10 | Constrained Decoding with Speculative Lookaheads | Nishanth Nakshatri et.al. | 2412.10418 | null |
2025-01-15 | Asymmetric Temperature Variations In Protoplanetary disks: I. Linear Theory, Corotating Spirals, and Ring Formation | Zhaohuan Zhu et.al. | 2412.09571 | null |
2024-12-12 | AI Predicts AGI: Leveraging AGI Forecasting and Peer Review to Explore LLMs’ Complex Reasoning Capabilities | Fabrizio Davide et.al. | 2412.09385 | null |
2024-12-11 | Can transformative AI shape a new age for our civilization?: Navigating between speculation and reality | Jesus L. Lobo et.al. | 2412.08273 | null |
2024-12-10 | Mapping the spatial extent of HI-rich absorbers using MgII absorption along gravitational arcs | Trystyn A. M. Berg et.al. | 2412.07652 | null |
2024-12-26 | CoinCLIP: A Multimodal Framework for Assessing Viability in Web3 Memecoins | Hou-Wan Long et.al. | 2412.07591 | null |
2024-12-10 | Modeling Speculative Trading Patterns in Token Markets: An Agent-Based Analysis with TokenLab | Mengjue Wang et.al. | 2412.07512 | null |
2024-12-10 | KPZ-like scaling on a high-dimensional hypersphere | Daniil Fedotov et.al. | 2412.07432 | null |
2024-12-10 | Exploring types I and IIA effective actions through T-duality | Mohammad R. Garousi et.al. | 2412.07234 | null |
2024-12-10 | Relativistic Mott transition in strongly correlated artificial graphene | Liguo Ma et.al. | 2412.07150 | null |
2024-12-10 | Gravitational focusing and horizon entropy for higher-spin fields | Zihan Yan et.al. | 2412.07107 | null |
2024-12-09 | Inelastic H + H $^+_3$ Collision rates and their impact in the determination of the excitation temperature of H$^+_3$ | Daniel Felix-Gonzalez et.al. | 2412.06697 | null |
2024-12-09 | Systematic comparison of deep generative models applied to multivariate financial time series | Howard Caulfield et.al. | 2412.06417 | null |
2024-12-09 | Beyond pip install: Evaluating LLM Agents for the Automated Installation of Python Projects | Louis Milliken et.al. | 2412.06294 | link |
2024-12-06 | Revisiting the hallmark freezing and melting points in colloidal dispersions and the search for the elusive coexistence region | J. Galen Wang et.al. | 2412.05422 | null |
2024-12-06 | Penetrative rotating magnetoconvection subject to lateral variations in temperature gradients | Tirtharaj Barman et.al. | 2412.05235 | null |
2024-12-06 | Predictive Window Decoding for Fault-Tolerant Quantum Programs | Joshua Viszlai et.al. | 2412.05115 | null |
2024-12-04 | Successive magnetic transitions in the spin-5/2 easy-axis triangular-lattice antiferromagnet Na $_2$BaMn(PO$_4$)$_2$ : A neutron diffraction study | Chuandi Zhang et.al. | 2412.03149 | null |
2025-01-02 | The Reality of AI and Biorisk | Aidan Peppin et.al. | 2412.01946 | null |
2024-12-02 | PLD+: Accelerating LLM inference by leveraging Language Model Artifacts | Shwetha Somasundaram et.al. | 2412.01447 | null |
2024-12-02 | Enhanced solid solution hardening by off-center substitutional solute atoms in α-Ti | Zi-Han Yu et.al. | 2412.01298 | null |
2024-11-25 | Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration | Zhuofan Wen et.al. | 2412.00061 | null |
2024-11-12 | The Copernican Argument for Alien Consciousness; The Mimicry Argument Against Robot Consciousness | Eric Schwitzgebel et.al. | 2412.00008 | null |
2024-11-28 | Night-Side Relativistic Electron Precipitation Bursts in the Outer Radiation Belt: Insights from ELFIN and THEMIS | Xi Lu et.al. | 2411.19232 | null |
2024-11-27 | Magnetic field tuned superconducting and normal phase magnetism in CeCo ${0.5}$Rh${0.5}$In$_{5}$ | A. Howell et.al. | 2411.18540 | null |
2024-11-27 | Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding | Ziyin Zhang et.al. | 2411.18462 | link |
2024-11-27 | 6G Takes Shape | Jeffrey G. Andrews et.al. | 2411.18435 | null |
2024-11-27 | An evolution of matrix-valued orthogonal polynomials | Erik Koelink et.al. | 2411.18362 | null |
2024-11-27 | Comprehensive Kernel Safety in the Spectre Era: Mitigations and Performance Evaluation (Extended Version) | Davide Davoli et.al. | 2411.18094 | null |
2024-12-25 | Stellar evolution along the AGB as revealed by the shape of Miras’ visual light curves | D. T. Hoai et.al. | 2411.18044 | null |
2024-11-26 | Stable curves and chromatic polynomials | Bernhard Reinke et.al. | 2411.17551 | null |
2024-12-08 | A revamped understanding of Cosmic Rays and Gamma-Ray Bursts | A. De Rújula et.al. | 2411.15850 | null |
2024-11-20 | The Impossible Test: A 2024 Unsolvable Dataset and A Chance for an AGI Quiz | David Noever et.al. | 2411.14486 | null |
2024-12-03 | Mediating Modes of Thought: LLM’s for design scripting | Moritz Rietschel et.al. | 2411.14485 | null |
2024-11-21 | THz optical response of Ba(Fe ${1-x}$Ni$_x$)$_2$As$_2$ films analyzed within the three-band Eliashberg s$\pm$ -wave model | Yurii A. Aleshchenko et.al. | 2411.14011 | null |
2024-11-27 | Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding | Hyun Ryu et.al. | 2411.13157 | null |
2024-11-20 | Far-field Boundary Conditions for Airfoil Simulation at High Incidence in Steady, Incompressible, Two-dimensional Flow | Narges Golmirzaee et.al. | 2411.13077 | null |
2024-11-19 | Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing | Ruyi Ding et.al. | 2411.12508 | null |
2024-11-18 | Continuous Speculative Decoding for Autoregressive Image Generation | Zili Wang et.al. | 2411.11925 | link |
2024-12-26 | Teapot: Efficiently Uncovering Spectre Gadgets in COTS Binaries | Fangzheng Lin et.al. | 2411.11624 | null |
2024-11-30 | Diversity of disc viscosities can explain the period ratios of resonant and non-resonant systems of hot super-Earths and mini-Neptunes | Bertram Bitsch et.al. | 2411.11452 | null |
2024-11-25 | First memoir on the asymptotics of certain infinite products | Wadim Zudilin et.al. | 2411.11100 | null |
2024-11-17 | FastDraft: How to Train Your Draft | Ofir Zafrir et.al. | 2411.11055 | null |
2024-12-16 | SAM Decoding: Speculative Decoding via Suffix Automaton | Yuxuan Hu et.al. | 2411.10666 | link |
2024-11-15 | Moving Forward: A Review of Autonomous Driving Software and Hardware Systems | Xu Wang et.al. | 2411.10291 | null |
2024-11-14 | Cosmic inflation in an extended non-commutative foliated quantum gravity: the wave function of the universe | César A. Zen Vasconcellos et.al. | 2411.09756 | null |
2024-11-15 | Provocation: Who benefits from “inclusion” in Generative AI? | Samantha Dalal et.al. | 2411.09102 | null |
2024-11-13 | Thought Experiments in Design Fiction for Visualization | Swaroop Panda et.al. | 2411.08621 | null |
2025-01-01 | A Geometric Substructure for Quantum Dynamics | Anthony John Bracken et.al. | 2411.08230 | null |
2025-01-11 | The Grass of the Universe: Rethinking Technosphere, Planetary History, and Sustainability with Fermi Paradox | Lukáš Likavčan et.al. | 2411.08057 | null |
2024-11-12 | A rich structure of renormalization group flows for Higgs-like models in 4 dimensions | André LeClair et.al. | 2411.07476 | null |
2024-11-12 | Input-Based Ensemble-Learning Method for Dynamic Memory Configuration of Serverless Computing Functions | Siddharth Agarwal et.al. | 2411.07444 | null |
2024-11-11 | The Inherent Adversarial Robustness of Analog In-Memory Computing | Corey Lammie et.al. | 2411.07023 | null |
2024-11-10 | Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents | Yu Gu et.al. | 2411.06559 | link |
2024-11-10 | MOCCA-III: Effects of pristine gas accretion and cluster migration on globular cluster evolution, global parameters and multiple stellar populations | Mirek Giersz et.al. | 2411.06421 | null |
2024-11-10 | Generating Mixcode Popular Songs with Artificial Intelligence: Concepts, Plans, and Speculations | Abhishek Kaushik et.al. | 2411.06420 | null |
2024-11-08 | SSSD: Simply-Scalable Speculative Decoding | Michele Marzollo et.al. | 2411.05894 | null |
2024-11-08 | SpecHub: Provable Acceleration to Multi-Draft Speculative Decoding | Ryan Sun et.al. | 2411.05289 | link |
2024-11-07 | SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference | Gabriele Oliaro et.al. | 2411.04975 | null |
2024-11-06 | The N-Grammys: Accelerating Autoregressive Inference with Learning-Free Batched Speculation | Lawrence Stewart et.al. | 2411.03786 | null |
2024-11-05 | Remarkable Scale Relation, Approximate SU(5), Fluctuating Lattice | Holger Bech Nielsen et.al. | 2411.03552 | null |
2024-11-05 | Shared Memory-Aware Latency-Sensitive Message Aggregation for Fine-Grained Communication | Kavitha Chandrasekar et.al. | 2411.03533 | null |
2024-11-07 | A high resolution simulation of protoplanetary disk turbulence driven by the vertical shear instability | Karim Shariff et.al. | 2411.03467 | null |
2024-11-04 | PipeLLM: Fast and Confidential Large Language Model Services with Speculative Pipelined Encryption | Yifan Tan et.al. | 2411.03357 | null |
2024-11-05 | On the possible core shift break in relativistic jets | E. E. Nokhrina et.al. | 2411.02925 | null |
2024-11-04 | A proof of self-organized criticality in a sandpile | Christopher Hoffman et.al. | 2411.02541 | null |
2025-02-07 | Pseudo Transitions in the Finite-Size Blume-Capel Model | Lei Shi et.al. | 2411.01743 | null |
2024-11-05 | Privacy Risks of Speculative Decoding in Large Language Models | Jiankun Wei et.al. | 2411.01076 | null |
2024-10-30 | Accelerated AI Inference via Dynamic Execution Methods | Haim Barad et.al. | 2411.00853 | null |
2024-10-30 | A Theoretical Perspective for Speculative Decoding Algorithm | Ming Yin et.al. | 2411.00841 | null |
2024-10-31 | Interpretable Language Modeling via Induction-head Ngram Models | Eunji Kim et.al. | 2411.00066 | link |
2024-10-31 | ALISE: Accelerating Large Language Model Serving with Speculative Scheduling | Youpeng Zhao et.al. | 2410.23537 | null |
2024-10-30 | Flavor Patterns of Fundamental Particles from Quantum Entanglement? | Jesse Thaler et.al. | 2410.23343 | null |
2024-10-29 | Lost and Found in Speculation: Hybrid Speculative Vulnerability Detection | Mohamadreza Rostami et.al. | 2410.22555 | null |
2025-02-10 | Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding | Bohan Li et.al. | 2410.21951 | null |
2024-10-29 | Rapid cooling of the Cassiopeia A neutron star due to superfluid quantum criticality | Hao-Fu Zhu et.al. | 2410.21945 | null |
2024-10-28 | Model-agnostic basis functions for the 2-point correlation function of dark matter in linear theory | Aseem Paranjape et.al. | 2410.21374 | link |
2024-10-11 | The Social Impact of Generative LLM-Based AI | Yu Xie et.al. | 2410.21281 | null |
2024-10-28 | On the limits of informationally efficient stock markets: New insights from a chartist-fundamentalist model | Laura Gardini et.al. | 2410.21198 | null |
2024-10-27 | A Jet-Induced Shock in a Young, Powerful Radio Galaxy at z=3.00 | Nick Seymour et.al. | 2410.20609 | null |
2024-10-27 | FIRP: Faster LLM inference via future intermediate representation prediction | Pengfei Wu et.al. | 2410.20488 | null |
2024-10-27 | Inevitable Trade-off between Watermark Strength and Speculative Sampling Efficiency for Language Models | Zhengmian Hu et.al. | 2410.20418 | null |
2024-10-31 | Fast Best-of-N Decoding via Speculative Rejection | Hanshi Sun et.al. | 2410.20290 | link |
2024-10-24 | Intention Is All You Need | Advait Sarkar et.al. | 2410.18851 | null |
2024-10-24 | AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability | Sudhanshu Agrawal et.al. | 2410.18351 | null |
2024-10-23 | Multi-Draft Speculative Sampling: Canonical Architectures and Theoretical Limits | Ashish Khisti et.al. | 2410.18234 | null |
2025-02-10 | Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition | Artem Basharin et.al. | 2410.17765 | null |
2024-10-22 | AMUSD: Asynchronous Multi-Device Speculative Decoding for LLM Acceleration | Bradley McDanel et.al. | 2410.17375 | link |
2024-10-22 | Remote Timing Attacks on Efficient Language Model Inference | Nicholas Carlini et.al. | 2410.17175 | null |
2024-10-23 | Quantum many-body scars as remnants of stable many-body periodic orbits | Keita Omiya et.al. | 2410.16916 | null |
2024-10-22 | Chiral polaritonics: cavity-mediated enantioselective excitation condensation | Rosario R. Riso et.al. | 2410.16861 | null |
2024-10-22 | An Extreme Radio Fluctuation of Pulsar B1929 $+$ 10 | Zhengli Wang et.al. | 2410.16816 | null |
2024-10-21 | Galaxy Size and Mass Build-up in the First 2 Gyrs of Cosmic History from Multi-Wavelength JWST NIRCam Imaging | Natalie Allen et.al. | 2410.16354 | null |
2024-10-30 | TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling | Jiahao Qiu et.al. | 2410.16033 | null |
2024-10-21 | Efficient and Universally Accessible Cross-Chain Options without Upfront Holder Collateral | Zifan Peng et.al. | 2410.15724 | null |
2024-10-21 | Investigating Unusual H $α$ Features towards the Scutum Supershell | R. Alsulami et.al. | 2410.15712 | null |
2024-10-17 | Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding | Tan Dat Nguyen et.al. | 2410.13839 | null |
2024-10-17 | Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions | Michael J. Q. Zhang et.al. | 2410.13788 | null |
2024-10-17 | Looking Inward: Language Models Can Learn About Themselves by Introspection | Felix J Binder et.al. | 2410.13787 | link |
2024-10-17 | PGC 44685: A Dwarf Star-forming Lenticular Galaxy with Wolf-Rayet Population | Shiying Lu et.al. | 2410.13119 | null |
2024-10-16 | Gravitational instantons and the quality problem of the QCD axion: Facts, speculations, and statements in between | Pier Giuseppe Catinari et.al. | 2410.12741 | null |
2024-10-15 | Evolution of Ferromagnetism and Electrical Resistivity in Sb-Doped Cr4PtGa17 | Chaoguo Wang et.al. | 2410.12078 | null |
2024-10-15 | MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation | Chenxi Wang et.al. | 2410.11779 | link |
2024-10-15 | DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure | Yunfan Xiong et.al. | 2410.11744 | null |
2024-10-15 | Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling | Wenda Xu et.al. | 2410.11325 | null |
2025-02-01 | QSpec: Speculative Decoding with Complementary Quantization Schemes | Juntao Zhao et.al. | 2410.11305 | null |
2024-11-20 | Unveiling dust, molecular gas, and high star formation efficiency in extremely UV bright star-forming galaxies at $z\sim 2.1-3.6$ | M. Dessauges-Zavadsky et.al. | 2410.11121 | null |
2024-10-01 | Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models | Keivan Alizadeh et.al. | 2410.10846 | null |
2024-10-15 | The Discovery of Polarized Water Vapor Megamaser Emission in a Molecular Accretion Disk | Jack F. Gallimore et.al. | 2410.10569 | null |
2024-10-14 | Temperature-Centric Investigation of Speculative Decoding with Knowledge Distillation | Siru Ouyang et.al. | 2410.10141 | null |
2024-11-12 | Self-Data Distillation for Recovering Quality in Pruned Large Language Models | Vithursan Thangarasa et.al. | 2410.09982 | null |
2024-10-13 | Super-Bandgap Electroluminescence from Cesium Lead Bromide | Justin Sculley et.al. | 2410.09702 | null |
2024-10-21 | On Two Nucleons Near Unitarity with Perturbative Pions | Yu Ping Teng et.al. | 2410.09653 | null |
2024-10-11 | Compact [OIII] emission-line regions (“Green Seeds”) in $\mathrm{Hα}$ emitters at Cosmic Noon from JWST Observations | Nuo Chen et.al. | 2410.08520 | null |
2024-10-09 | SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration | Heming Xia et.al. | 2410.06916 | link |
2025-02-06 | Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level | Xinyi Zeng et.al. | 2410.06809 | null |
2024-10-08 | ParallelSpec: Parallel Drafter for Efficient Speculative Decoding | Zilin Xiao et.al. | 2410.05589 | null |
2024-10-09 | Density estimation with LLMs: a geometric investigation of in-context learning trajectories | Toni J. B. Liu et.al. | 2410.05218 | null |
2024-10-08 | Efficient Inference for Large Language Model-based Generative Recommendation | Xinyu Lin et.al. | 2410.05165 | null |
2024-10-04 | Density functional theory based investigation of heavy fermion band candidates in triplet superconductor UTe2 | Shouzheng Liu et.al. | 2410.03840 | null |
2024-10-04 | Mixture of Attentions For Speculative Decoding | Matthieu Zimmer et.al. | 2410.03804 | null |
2024-10-03 | AI-rays: Exploring Bias in the Gaze of AI Through a Multimodal Interactive Installation | Ziyao Gao et.al. | 2410.03786 | null |
2024-09-24 | Nonmetric geometric flows and quasicrystalline topological phases for dark energy and dark matter in $f(Q)$ cosmology | L. Bubuianu et.al. | 2410.03700 | null |
2025-01-31 | LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding | Doohyuk Jang et.al. | 2410.03355 | null |
2024-10-04 | Generative Edge Detection with Stable Diffusion | Caixia Zhou et.al. | 2410.03080 | null |
2024-10-03 | Inductive Generative Recommendation via Retrieval-based Speculation | Yijie Ding et.al. | 2410.02939 | link |
2024-10-03 | The Stellar Initial Mass Function of Early Dark Matter-free Gas Objects | William Lake et.al. | 2410.02868 | null |
2024-10-03 | Atoms near a conducting wedge: decay rates and entanglement around a corner | Romuald Kilianski et.al. | 2410.02349 | null |
2024-10-02 | Time Variation of the Solar Tachocline | Sarbani Basu et.al. | 2410.01895 | null |
2024-12-25 | Interpretable Contrastive Monte Carlo Tree Search Reasoning | Zitian Gao et.al. | 2410.01707 | link |
2024-10-02 | Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding | Yao Teng et.al. | 2410.01699 | link |
2024-12-09 | Forte : Finding Outliers with Representation Typicality Estimation | Debargha Ganguly et.al. | 2410.01322 | link |
2024-10-02 | Speculative Coreset Selection for Task-Specific Fine-tuning | Xiaoyu Zhang et.al. | 2410.01296 | null |
2024-10-01 | Draft on the Fly: Adaptive Self-Speculative Decoding using Cosine Similarity | Michael R. Metel et.al. | 2410.01028 | null |
2024-10-01 | A Scheduling-Aware Defense Against Prefetching-Based Side-Channel Attacks | Till Schlüter et.al. | 2410.00452 | null |
2024-11-12 | Galactic center G objects as dust-enshrouded stars near the supermassive black hole | Michal Zajaček et.al. | 2410.00304 | null |
2024-09-30 | Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface | Wenyue Hua et.al. | 2410.00079 | null |
2024-09-30 | Statistical view of orbital circularisation with 14 000 characterised TESS eclipsing binaries | L. W. IJspeert et.al. | 2409.20540 | null |
2024-09-30 | New HI observations Toward the NGC 5055 Galaxy Group with FAST | Xiao-Lan Liu et.al. | 2409.20109 | null |
2024-09-27 | Thermal Conductivity of Cubic Silicon Carbide Single Crystals Heavily Doped by Nitrogen | Zifeng Huang et.al. | 2409.18843 | null |
2024-09-27 | SpecCFA: Enhancing Control Flow Attestation/Auditing via Application-Aware Sub-Path Speculation | Adam Caulfield et.al. | 2409.18403 | null |
2024-09-25 | Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference | Zongyue Qin et.al. | 2409.16560 | null |
2024-09-22 | ALMASOP. The Localized and Chemically rich Features near the Bases of the Protostellar Jet in HOPS 87 | Shih-Ying Hsu et.al. | 2409.14445 | null |
2024-09-21 | Triangulating on Possible Futures: Conducting User Studies on Several Futures Instead of Only One | Antti Salovaara et.al. | 2409.14137 | null |
2024-09-29 | String Invention, Viable 3-3-1 Model, Dark Matter Black Holes | Holger B. Nielsen et.al. | 2409.13776 | null |
2024-09-20 | Interstellar Glycolaldehyde, Methyl Formate, and Acetic Acid. II. Chemical Modeling of the Bimodal Abundance Pattern in NGC 6334I | Brielle M. Shope et.al. | 2409.13673 | null |
2024-09-20 | A Comparison between Financial and Gambling Markets | Haoyu Liu et.al. | 2409.13528 | null |
2024-12-12 | Consequences of Minimal Entanglement in Bosonic Field Theories | Spencer Chang et.al. | 2409.13030 | null |
2024-09-17 | UNCOVER: Significant Reddening in Cosmic Noon Quiescent Galaxies | Jared Siegel et.al. | 2409.11457 | null |
2024-09-17 | The ALMA-CRISTAL Survey: Spatially-resolved Star Formation Activity and Dust Content in 4 < z < 6 Star-forming Galaxies | Juno Li et.al. | 2409.10961 | null |
2024-12-14 | Improving Multi-candidate Speculative Decoding | Xiaofan Lu et.al. | 2409.10644 | link |
2024-09-16 | Aggregation-diffusion in heterogeneous environments | Jonathan R. Potts et.al. | 2409.10147 | link |
2024-12-12 | Pure Lovelock Gravity regular black holes | Milko Estrada et.al. | 2409.09559 | null |
2024-09-14 | Ground State Phase Diagram of $\text{SU}(3)$ $t$-$J$ Chain | Junhao Zhang et.al. | 2409.09344 | null |
2024-12-02 | Two-Time Relativistic Bohmian Model of Quantum Mechanics | Giuseppe Raguní et.al. | 2409.09049 | null |
2024-09-13 | Dynamic Simultaneous Multithreaded Architecture | Daniel Ortiz-Arroyo et.al. | 2409.07903 | null |
2024-09-09 | DNA sequence alignment: An assignment for OpenMP, MPI, and CUDA/OpenCL | Arturo Gonzalez-Escribano et.al. | 2409.06075 | null |
2024-10-05 | Predicting Foreign Exchange EUR/USD direction using machine learning | Kevin Cedric Guyard et.al. | 2409.04471 | null |
2024-09-05 | Evidence for Dust Depletion in a Misaligned Protoplanetary Disk with JWST | C. C. Espaillat et.al. | 2409.03702 | null |
2024-09-04 | Cavitating bubbles in condensing gas as a means of forming clumps, chondrites, and planetesimals | Eugene Chiang et.al. | 2409.02978 | null |
2024-09-03 | Light-Ray Wave Functions and Integrability | Alexandre Homrich et.al. | 2409.02160 | null |
2024-09-03 | Foreactor: Exploiting Storage I/O Parallelism with Explicit Speculation | Guanzhou Hu et.al. | 2409.01580 | null |
2024-09-02 | A Comprehensive Analysis of the Future of Atomically Precise Manufacturing | Vadym Shvydun et.al. | 2409.00955 | null |
2024-08-30 | Dynamic Depth Decoding: Faster Speculative Decoding for LLMs | Oscar Brown et.al. | 2409.00142 | null |
2024-08-29 | LightSLH: Provable and Low-Overhead Spectre v1 Mitigation through Targeted Instruction Hardening | Yiming Zhu et.al. | 2408.16220 | null |
2024-08-28 | An Empirical Study of API Misuses of Data-Centric Libraries | Akalanka Galappaththi et.al. | 2408.15853 | null |
2024-08-28 | Indirect nonlinear interaction between toroidal Alfvén eigenmode and ion temperature gradient mode mediated by zonal structures | Qian Fang et.al. | 2408.15782 | null |
2024-09-19 | Learning Harmonized Representations for Speculative Sampling | Lefan Zhang et.al. | 2408.15766 | null |
2024-08-28 | Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation | Lujun Gui et.al. | 2408.15562 | null |
2024-11-18 | The companion mass distribution of post common envelope hot subdwarf binaries: evidence for boosted and disrupted magnetic braking? | Lisa Blomberg et.al. | 2408.15334 | null |
2024-08-27 | The Way To Circumbinary Planets | Hans J Deeg et.al. | 2408.15307 | null |
2024-12-26 | The Mamba in the Llama: Distilling and Accelerating Hybrid Models | Junxiong Wang et.al. | 2408.15237 | link |
2024-08-26 | SO as shock tracer in protoplanetary disks: the AB Aurigae case | A. Dutrey et.al. | 2408.14276 | null |
2024-08-25 | The origins of noise in the Zeeman splitting of spin qubits in natural-silicon devices | Juan S. Rojas-Arias et.al. | 2408.13707 | null |
2024-07-22 | Simopt – Simulation pass for Speculative Optimisation of FPGA-CAD flow | Eashan Wadhwa et.al. | 2408.12676 | null |
2024-12-19 | Exposing Shadow Branches | Chrysanthos Pepi et.al. | 2408.12592 | null |
2024-08-22 | Enhancing Causal Discovery in Financial Networks with Piecewise Quantile Regression | Cameron Cornell et.al. | 2408.12210 | null |
2024-08-21 | Electrostatic Origins of the Dirichlet Principle | Steven Deckelman et.al. | 2408.12002 | null |
2024-09-04 | Parallel Speculative Decoding with Adaptive Draft Length | Tianyu Liu et.al. | 2408.11850 | link |
2024-08-21 | Chemical models of interstellar glycine and adenine precursor aminoacetonitrile (NH2CH2CN) | Xia Zhang et.al. | 2408.11776 | null |
2024-08-20 | High detection significance of the dark substructure in gravitational lens SDSSJ0946+1006 is revealed by image pixel supersampling | Quinn E. Minor et.al. | 2408.11090 | null |
2024-08-23 | MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding | Jian Chen et.al. | 2408.11049 | link |
2024-08-20 | Revisiting the measurements and interpretations of DLVO forces | Bo Feng et.al. | 2408.10870 | null |
2024-08-19 | Constraining the Generalized Tolman-Oppenheimer-Volkoff (GTOV) equation with Bayesian analysis | Franciele M. da Silva et.al. | 2408.10425 | null |
2024-08-18 | A new measure of risk using Fourier analysis | Michael Grabinski et.al. | 2408.10279 | null |
2024-08-19 | Excitonic-trion population in two-dimensional halide perovskites | Efstratios Manousakis et.al. | 2408.10097 | null |
2024-08-16 | Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling | Xianzhen Luo et.al. | 2408.08696 | null |
2024-08-15 | KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning | Kaiqi Zhang et.al. | 2408.08146 | null |
2024-08-19 | Coupling without Communication and Drafter-Invariant Speculative Decoding | Majid Daliri et.al. | 2408.07978 | link |
2024-12-06 | The Small Sizes and High Implied Densities of `Little Red Dots’ with Balmer Breaks Could Explain Their Broad Emission Lines Without an AGN | Josephine F. W. Baggen et.al. | 2408.07745 | null |
2024-08-14 | Only One Relation Possible? Modeling the Ambiguity in Event Temporal Relation Extraction | Yutong Hu et.al. | 2408.07353 | null |
2024-07-23 | Stablecoin Runs and Disclosure Policy in the Presence of Large Sales | Brian Zhu et.al. | 2408.07227 | null |
2024-08-13 | Speculations on Uncertainty and Humane Algorithms | Nicholas Gray et.al. | 2408.06736 | null |
2024-08-15 | Inefficiencies of Carbon Trading Markets | Nicola Borri et.al. | 2408.06497 | null |
2024-08-12 | Correct Wrong Path | Bhargav Reddy Godala et.al. | 2408.05912 | null |
2024-08-11 | A Decoding Acceleration Framework for Industrial Deployable LLM-based Recommender Systems | Yunjia Xi et.al. | 2408.05676 | link |
2024-08-16 | Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion | Jacob K Christopher et.al. | 2408.05636 | null |
2024-08-09 | Recurrent Stochastic Fluctuations with Financial Speculation | Tomohiro Hirano et.al. | 2408.05047 | null |
2024-08-08 | HotStuff-1: Linear Consensus with One-Phase Speculation | Dakai Kang et.al. | 2408.04728 | null |
2024-08-08 | CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding | Sophia Ho et.al. | 2408.04678 | null |
2024-08-08 | Black hole mass and optical radiation mechanism of the tidal disruption event AT 2023clx | Shiyan Zhong et.al. | 2408.04448 | null |
2024-08-05 | Rich dynamical behaviors from a digital reversal operation | Yannis Almirantis et.al. | 2408.02527 | null |
2024-08-08 | A speculative model for cyclic information preservation in Kerr-Newman spacetime using closed timelike curves | Aviral Damle et.al. | 2408.02116 | null |
2024-08-06 | Selection bias obfuscates the discovery of fast radio burst sources | Mohit Bhardwaj et.al. | 2408.01876 | null |
2024-08-03 | Dissolution zone model of the oxide structure in additively manufactured dispersion-strengthened alloys | Wenyuan Hou et.al. | 2408.01845 | null |
2024-08-02 | AT2023vto: An Exceptionally Luminous Helium Tidal Disruption Event from a Massive Star | Harsh Kumar et.al. | 2408.01482 | null |
2024-08-01 | Granting GPT-4 License and Opportunity: Enhancing Accuracy and Confidence Estimation for Few-Shot Event Detection | Steven Fincke et.al. | 2408.00914 | null |
2024-08-01 | Clover-2: Accurate Inference for Regressive Lightweight Speculative Decoding | Bin Xiao et.al. | 2408.00264 | null |
2024-07-31 | Designing Beyond Current Conceptualizations of Spaceflight Experiences | James Cole et.al. | 2408.00085 | null |
2024-07-31 | Revisiting the fundamental metallicity relation with observation and simulation | Chengyu Ma et.al. | 2407.21716 | null |
2024-07-31 | The Bulk Densities of Small Solar System Bodies as a Probe of Planetesimal Formation | Misako Tatsuuma et.al. | 2407.21386 | null |
2024-08-19 | Instantons and the Large N=4 Algebra | Edward Witten et.al. | 2407.20964 | null |
2024-07-17 | Steamroller Problems: An Evaluation of LLM Reasoning Capability with Automated Theorem Prover Strategies | Lachlan McGinness et.al. | 2407.20244 | null |
2024-08-19 | Reduced decay in Josephson coupling across ferromagnetic junctions with spin-orbit coupling layers | Ivan Kindiak et.al. | 2407.19799 | null |
2024-07-26 | Ionized and cold gas components in low surface brightness galaxy AGC 102004 | Tian-Wen Cao et.al. | 2407.18530 | null |
2024-07-25 | Phase transitions in (2 + 1)D subsystem-symmetric monitored quantum circuits | Cole Kelson-Packer et.al. | 2407.18340 | null |
2024-08-31 | Uniqueness of an $E_8$ model of elementary particles | Robert A. Wilson et.al. | 2407.18279 | null |
2024-07-24 | Automorphisms of Calabi-Yau threefolds from algebraic dynamics and the second Chern class | Keiji Oguiso et.al. | 2407.17297 | null |
2024-07-24 | Mapping the individual, social, and biospheric impacts of Foundation Models | Andrés Domínguez Hernández et.al. | 2407.17129 | null |
2024-07-04 | Integrated Deflector Shield Technology for Spacecraft | Florian Neukart et.al. | 2407.16701 | null |
2024-07-23 | Graph-Structured Speculative Decoding | Zhuocheng Gong et.al. | 2407.16207 | null |
2024-07-22 | AI for Handball: predicting and explaining the 2024 Olympic Games tournament with Deep Learning and Large Language Models | Florian Felice et.al. | 2407.15987 | null |
2024-07-22 | An Ad-hoc graph node vector embedding algorithm for general knowledge graphs using Kinetica-Graph | B. Kaan Karamete et.al. | 2407.15906 | null |
2024-07-23 | Unveiling the Multifaceted GRB 200613A: Prompt Emission Dynamics, Afterglow Evolution, and the Host Galaxy’s Properties | Shao-Yu Fu et.al. | 2407.15824 | null |
2024-11-21 | SNIP: Speculative Execution and Non-Interference Preservation for Compiler Transformations | Sören van der Wall et.al. | 2407.15080 | null |
2024-10-21 | Is the difference between deep hedging and delta hedging a statistical arbitrage? | Pascal François et.al. | 2407.14736 | link |
2024-07-19 | Rational Bubbles: A Clarification | Tomohiro Hirano et.al. | 2407.14017 | null |
2024-07-18 | Surface roughening in nanoparticle catalysts | Cameron J. Owen et.al. | 2407.13643 | null |
2024-07-18 | SecScale: A Scalable and Secure Trusted Execution Environment for Servers | Ani Sunny et.al. | 2407.13572 | null |
2024-07-17 | RTL Verification for Secure Speculation Using Contract Shadow Logic | Qinhan Tan et.al. | 2407.12232 | null |
2024-07-16 | Breakup dynamics of a neutron-halo projectile on heavy target at deep sub-barrier energies | B. Mukeru et.al. | 2407.12129 | null |
2024-11-16 | PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation | Branden Butler et.al. | 2407.11798 | null |
2024-10-02 | Optimized Multi-Token Joint Decoding with Auxiliary Model for LLM Inference | Zongyue Qin et.al. | 2407.09722 | null |
2024-07-17 | Accelerating the inference of string generation-based chemical reaction models for industrial applications | Mikhail Andronov et.al. | 2407.09685 | null |
2024-09-12 | Krylov complexity and chaos in deformed SYK models | Shira Chapman et.al. | 2407.09604 | null |
2024-07-21 | 6G: The Intelligent Network of Everything – A Comprehensive Vision, Survey, and Tutorial | Harri Pennanen et.al. | 2407.09398 | null |
2024-07-11 | Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting | Zilong Wang et.al. | 2407.08223 | null |
2024-07-10 | Purity benchmarking study of error coherence in a single Xmon qubit | Auda Zhu et.al. | 2407.07960 | null |
2024-07-10 | Carbon Pricing and Resale in Emission Trading Systems | Peyman Khezr et.al. | 2407.07386 | null |
2024-08-21 | Fuzzy Spheres in Stringy Matrix Models: Quantifying Chaos in a Mixed Phase Space | Paolo Amore et.al. | 2407.07259 | null |
2024-07-09 | Revolutionizing Battery Disassembly: The Design and Implementation of a Battery Disassembly Autonomous Mobile Manipulator Robot(BEAM-1) | Yanlong Peng et.al. | 2407.06590 | null |
2024-07-05 | Statistical investigations into the geometry and homology of random programs | Jon Sporring et.al. | 2407.04854 | null |
2024-07-05 | Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models | Bolaji Yusuf et.al. | 2407.04641 | null |
2024-11-13 | Black Holes with a charged quantum dust core | R. Casadio et.al. | 2407.04146 | null |
2024-08-23 | A distance conjecture beyond moduli? | Cédric Debusschere et.al. | 2407.03715 | null |
2024-07-03 | Braneworld Black Bounce to Transversable Wormhole Analytically Connected to an asymptotically $AdS_5$ Boundary | T. M. Crispim et.al. | 2407.03528 | null |
2024-07-03 | Origin of anomalous magnetotransport in kagome superconductors AV ${3}$Sb${5}$ (A=K,Rb,Cs) | A. E. Koshelev et.al. | 2407.03189 | null |
2024-09-24 | Large-scale ordered magnetic fields generated in mergers of helium white dwarfs | Rüdiger Pakmor et.al. | 2407.02566 | null |
2024-07-02 | A thermodynamic model of inflation without inflaton field | Jesus Anaya-Galeana et.al. | 2407.02429 | null |
2024-07-02 | MICONIC: JWST/MIRI MRS observations of the nuclear and circumnuclear regions of Mrk231 | A. Alonso-Herrero et.al. | 2407.02180 | null |
2024-07-02 | S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models | Parsa Kavehzadeh et.al. | 2407.01955 | null |
2024-08-31 | Description of molecular chirality and its analysis with high harmonic generation | Akihito Kato et.al. | 2407.01947 | null |
2024-07-01 | Universal properties of residual moments in heavy-fermion metals | Ewan Scott et.al. | 2407.01218 | null |
2024-07-01 | Staying vigilant in the Age of AI: From content generation to content authentication | Yufan Li et.al. | 2407.00922 | null |
Multimodal System
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-06-20 | Consistent Sampling and Simulation: Molecular Dynamics with Energy-Based Diffusion Models | Michael Plainer et.al. | 2506.17139 | link |
2025-06-18 | VLMInferSlow: Evaluating the Efficiency Robustness of Large Vision-Language Models as a Service | Xiasi Wang et.al. | 2506.15755 | null |
2025-06-18 | Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model | Anirud Aggarwal et.al. | 2506.15682 | link |
2025-06-09 | Event-Priori-Based Vision-Language Model for Efficient Visual Understanding | Haotong Qin et.al. | 2506.07627 | null |
2025-06-15 | RNE: a plug-and-play framework for diffusion density estimation and inference-time control | Jiajun He et.al. | 2506.05668 | null |
2025-05-29 | Inference-time Scaling of Diffusion Models through Classical Search | Xiangcheng Zhang et.al. | 2505.23614 | null |
2025-05-27 | InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling | Xiaoxiao Jiang et.al. | 2505.20600 | null |
2025-05-25 | SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation | Shenggan Cheng et.al. | 2505.19151 | null |
2025-05-23 | VERDI: VLM-Embedded Reasoning for Autonomous Driving | Bowen Feng et.al. | 2505.15925 | null |
2025-05-20 | Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism | Kunyun Wang et.al. | 2505.14741 | null |
2025-04-14 | Efficient Task-specific Conditional Diffusion Policies: Shortcut Model Acceleration and SO(3) Optimization | Haiyong Yu et.al. | 2504.09927 | null |
2025-03-17 | VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers | Ruanjun Li et.al. | 2503.09387 | null |
2025-02-20 | Light communicative materials | Hongshuang Guo et.al. | 2503.05744 | null |
2025-03-10 | Probing the Quantum Nature of Gravity through Classical Diffusion | Oliviero Angeli et.al. | 2501.13030 | null |
2025-01-16 | PATCHEDSERVE: A Patch Management Framework for SLO-Optimized Hybrid Resolution Diffusion Serving | Desen Sun et.al. | 2501.09253 | null |
2025-01-16 | StructSR: Refuse Spurious Details in Real-World Image Super-Resolution | Yachao Li et.al. | 2501.05777 | link |
2024-12-19 | Unified Image Restoration and Enhancement: Degradation Calibrated Cycle Reconstruction Diffusion Model | Minglong Xue et.al. | 2412.14630 | link |
2024-12-06 | Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension | Xiyao Wang et.al. | 2412.03704 | link |
2024-12-05 | A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs | Wangbo Zhao et.al. | 2412.03324 | link |
2024-12-02 | [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster | Qizhe Zhang et.al. | 2412.01818 | link |
2025-03-30 | Staleness-Centric Optimizations for Parallel Diffusion MoE Inference | Jiajun Luo et.al. | 2411.16786 | null |
2024-10-29 | VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration | Dezhan Tu et.al. | 2410.23317 | null |
2025-01-07 | Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance | Dongmin Park et.al. | 2410.22376 | link |
2024-10-08 | A scaling limit for additive functionals | Thibaud Taillefumier et.al. | 2410.06383 | null |
2024-09-03 | CT-SDM: A Sampling Diffusion Model for Sparse-View CT Reconstruction across All Sampling Rates | Liutao Yang et.al. | 2409.01571 | null |
2024-07-27 | Faster Image2Video Generation: A Closer Look at CLIP Image Embedding’s Impact on Spatio-Temporal Cross-Attentions | Ashkan Taghipour et.al. | 2407.19205 | null |
2024-07-15 | LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis | Zhenxiong Tan et.al. | 2407.10468 | link |
2024-06-13 | DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning | Xuemin Hu et.al. | 2406.09089 | null |
2024-10-03 | I4VGen: Image as Free Stepping Stone for Text-to-Video Generation | Xiefan Guo et.al. | 2406.02230 | null |
2024-05-30 | DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation | Zachary Novack et.al. | 2405.20289 | null |
2024-05-26 | Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference | Xunpeng Huang et.al. | 2405.16387 | null |
2025-04-16 | Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models | Katherine Xu et.al. | 2405.14828 | null |
2024-04-25 | Inferring solid-state diffusivity in lithium-ion battery active materials: improving upon the classical GITT method | A. Emir Gumrukcuoglu et.al. | 2404.16658 | null |
2024-05-02 | Privacy-Preserving Diffusion Model Using Homomorphic Encryption | Yaojian Chen et.al. | 2403.05794 | link |
2024-05-08 | ToDo: Token Downsampling for Efficient Generation of High-Resolution Images | Ethan Smith et.al. | 2402.13573 | null |
2024-06-03 | DITTO: Diffusion Inference-Time T-Optimization for Music Generation | Zachary Novack et.al. | 2401.12179 | null |
2023-12-10 | Statistical Spatially Inhomogeneous Diffusion Inference | Yinuo Ren et.al. | 2312.05793 | null |
2024-01-04 | Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference | Zihao Yu et.al. | 2305.17423 | link |
2023-10-25 | ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval | Kexun Zhang et.al. | 2302.02285 | link |
2021-08-11 | Manifold-aware Synthesis of High-resolution Diffusion from Structural Imaging | Benoit Anctil-Robitaille et.al. | 2108.04135 | null |
2021-12-22 | Functional Data Analysis with Rough Sample Paths? | Neda Mohammadi et.al. | 2105.12035 | null |
2014-06-03 | $C^0$ -estimates and smoothness of solutions to the parabolic equation defined by Kimura operators | Camelia A. Pop et.al. | 1406.0742 | null |
2015-04-01 | On nonnegative unbiased estimators | Pierre E. Jacob et.al. | 1309.6473 | null |