ENLSP NeurIPS Workshop 2024 | ENLSP highlights some fundamental problems in NLP and speech processing related to efficiency of the models, training and inference for the general ML and DL communities.

Accepted papers

poster

Snakes and Ladders: Accelerating State Space Model Inference with Speculative Decoding

Paper Appendix

poster

GEAR: An Efficient Error Reduction Framework for KV Cache Compression in LLM Inference

Paper

poster

One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation

Paper

poster

Rephrasing natural text data with different languages and quality levels for Large Language Model pre-training

Paper

poster

Post-Training Statistical Calibration for Higher Activation Sparsity

Paper

poster

ThinK: Thinner Key Cache by Query-Driven Pruning

Paper

poster

BiRNA-BERT: Adaptive Tokenization for Efficient RNA Language Modeling

Paper

poster

Disentangling Questions from Query Generation for Task-Adaptive Retrieval

Paper

poster

The N-Grammys: Accelerating Autoregressive Inference with Learning-Free Batched Speculation

Paper

poster

Different Rates for Different Weights: Decoupled Relative Learning Rate Schedules

Paper

poster

Distributed Speculative Inference of Large Language Models

Paper

poster

XC-C ACHE: Cross-Attending to Cached Context for Efficient LLM Inference

Paper

poster

Text Summarization With Graph Attention Networks

Paper

poster

How Redundant Is the Transformer Stack in Speech Representation Models?

Paper

poster

Dynamic Speculation Lookahead Accelerates Speculative Decoding of Large Language Models

Paper

poster

AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability

Paper

poster

OnlySportsLM: Optimizing Sports-Domain Language Models with SOTA Performance under Billion Parameters

Paper

poster

Dense Backpropagation Improves Routing for Sparsely-Gated Mixture-of-Experts

Paper

poster

Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

Paper

poster

VL-Mamba: Exploring State Space Models for Multimodal Learning

Paper

poster

An Evolved Universal Transformer Memory

Paper

poster

Less is Enough: Adapting Pre-trained Vision Transformers for Audio-Visual Speaker Verification

Paper

poster

Composite Attention: A Framework for Combining Sequence Mixing Primitives

Paper

poster

Inference-Friendly Models With MixAttention

Paper

poster

S2D: Sorted Speculative Decoding For More Efficient Deployment of Large Language Models

Paper

poster

On the Efficiency of NLP-Inspired Methods for Tabular Deep Learning

Paper

poster

OLMOE: Open Mixture-of-Experts Language Models

Paper

poster

Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization

Paper

poster

MisD-MoE: A Multimodal Misinformation Detection Framework with Adaptive Feature Selection

Paper

poster

Sparsified State-Space Models are Efficient Highway Networks

Paper

poster

Approximations may be all you need: Towards Pre-training LLMs with Low-Rank Decomposition and Optimizers

Paper

poster

Partially Shared Query-Key for Lightweight Language Models

Paper

poster

RGP: Achieving Memory-Efficient Model Fine-tuning Via Randomized Gradient Projection

Paper

poster

Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities

Paper

poster

Improving Multi-candidate Speculative Decoding

Paper

poster

Efficient Alignment of Large Language Models via Data Sampling

Paper

poster

Longhorn: State Space Models are Amortized Online Learners

Paper

poster

Towards Low-bit Communication for Tensor Parallel LLM Inference

Paper

poster

CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios

Paper

poster

StructMoE : Augmenting MoEs with Hierarchically Routed Low Rank Experts

Paper

poster

Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate Selection

Paper

poster

Dynamic layer selection in decoder-only transformers

Paper

poster

KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and Knowledge Distillation

Paper

poster

Sparse Upcycling: Inference Inefficient Finetuning

Paper

poster

A Unified Framework for Speculative Decoding with Multiple Drafters as a Bandit

Paper

poster

Residual vector quantization for KV cache compression in large language model

Paper

poster

Accelerating the Low-Rank Decomposed Models

Paper

poster

CROSS-JEM: Accurate and Efficient Cross-encoders for Short-text Ranking Tasks

Paper Appendix

poster

Enhanced label noise robustness through early adaptive filtering for the self-supervised speaker verification task

Paper

poster

Speculative Streaming: Fast LLM Inference without Auxiliary Models

Paper

poster

On the Implicit Relation between Low-Rank Adaptation and Differential Privacy

Paper

poster

Is 3D Convolution with 5D Tensors Really Necessary for Video Analysis?

Paper

poster

Hysteresis Activation Function for Efficient Inference

Paper

poster

Approximate Top-k for Increased Parallelism

Paper

poster

A Simple and Effective L2 Norm-Based Strategy for KV Cache Compression

Paper

poster

RAEE: A Robust Retrieval-Augmented Early Exiting Framework for Efficient Inference

Paper

poster

Efficiently Dispatching Flash Attention For Partially Filled Attention Masks

Paper

poster

Speculative Diffusion Decoding for Accelerated Language Generation

Paper

poster

Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models

Paper

poster

The EarlyBird Gets the WORM: Heuristically Accelerating EarlyBird Convergence

Paper

poster

Post Training Quantization of Large Language Models with Microscaling Formats

Paper

poster

Computational Bottlenecks of Training Small-scale Large Language Models

Paper

poster

QuAILoRA: Quantization-Aware Initialization for LoRA

Paper

poster

Mai Ho‘omāuna i ka ‘Ai: Language Models Improve Automatic Speech Recognition in Hawaiian

Paper

poster

Inducing Elasticity in Foundation Models: Post-Training Techniques for Adaptable Inference

Paper

poster

NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks

Paper

poster

Enabling Resource-Efficient On-Device Fine-Tuning of LLMs Using Only Inference Engines

Paper

poster

EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models

Paper

poster

ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance & Efficiency on a Specific Domain

Paper

poster

Scaling laws for post-training quantized large language models

Paper

poster

Beyond Token Generation: Adaptive Chunk-Distilled Language Modeling

Paper

poster

Dataset Distillation for Audio Classification: A Data-Efficient Alternative to Active Learning

Paper

poster

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Paper

poster

SharedContextBench: Evaluating Long-Context Methods in KV Cache Reuse

Paper

poster

Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences

Paper

poster

Dynamic Vocabulary Pruning in Early-Exit LLMs

Paper

poster

FastDraft: How to Train Your Draft

Paper

poster

Lightweight Neural Networks for Speech Emotion Recognition using Layer-wise Adaptive Quantization

Paper

poster

LORC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy

Paper

poster

Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts

Paper

poster

SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings

Paper

poster

Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning

Paper

poster

Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond

Paper