ENLSP NeurIPS Workshop 2024

The fourth version of the Efficient Natural Language and Speech Processing (ENLSP-IV) workshop will focus on how to make large language and foundation models more efficient in terms of Architecture, Training, and Inference in their real-world applications. This year, following the trend of industry and academia, we put more emphasis on investigating new architectures to make future language and foundation models more efficient. Moreover, we highlight the importance of comprehensive evaluation and benchmarking new efficient models from different practical aspects. The workshop program offers an interactive platform for gathering experts and talents from academia and industry through invited talks, panel discussion, paper submission, reviews, interactive poster sessions, oral presentations and a couple of mentorship sessions for new researchers. This will be a unique opportunity to discuss and share challenging problems, build connections, exchange ideas and brainstorm, and foster future collaborations. The topics of this workshop can be of interest for people working on general machine learning, deep learning, hardware, optimization, theory and applications.

Overview

As large language models (e.g. GPT-3, GPT-4, Llama 3, PALM, Gemini, and Pangu-∑), pre-trained speech models (e.g. wav2vec, Hubert, wavLM, Whisper, Conformer-1 and Conformer-2 ) and other foundation models (e.g. GPT-4o, and Stable Diffusion) have advanced rapidly and become more prominent and widespread, improving their efficiency would be more crucial. While it is true that the computational power and GPU resources have played a significant role in the success of these models, we need to also be aware that using more computational resources can result in: (a) increasing the cost of training and deploying such models, (b) making the models less accessible, (c) less contribution from the research community, and (d) increasing the environmental costs of the models. Moreover, it is evident that most of these pre-trained models are largely over-parameterized and their efficiency is under question. Lack of efficiency can largely limit the application of these advanced models in practice.

Building upon the framework of our previous three editions, this workshop remains dedicated to investigating solutions for enhancing the efficiency of pre-trained language and foundation models but with introducing some fresh and important new topics to the community and encouraging their contributions. Just to highlight a few: (1) Despite the ubiquitous usage of Transformers, they suffer from quadratic computational complexity which limits their efficiency especially for longer sequence lengths. Should we improve the efficiency of Transformers (e.g. in Hedgehog, Gated Linear Attention) or look for other architectures (e.g. Mamba, Jamba, RVKW, xLSTM, and SSMs)? (2) For accelerating training, we have seen the significant impact of designing hardware efficient implementations such as in Flash Attention. Should we focus more on these hardware-aware solutions or more on new/improved architectures? (3) For efficient inference, there are solutions such as: Speculative Decoding [Link1] [Link2] where the performance is strongly model and task-dependent and the draft and target models should have the same vocabulary (tokenizer); improved KV-caching (e.g. [Link]) which has a limited speed-up; and many-in-one models such as SortedNet, MatFormer, and LayerSkip but the performance of sub-models drops compared to their corresponding individual models. (4) While there are many so-called efficient solutions in the literature, there is no fair, comprehensive and practical evaluation of these models and their comparison to each other. For example, we do not know the hallucination extent of the new architectures vs. the transformer model (e.g. in [Link]).

Call for Papers

Investing in the future of language and foundation models requires a concrete effort to enhance their efficiency across multiple dimensions (including architecture, training, and inference) and having a comprehensive evaluation framework. To encourage engagement from the NeurIPS community, we present several active research topics in this field that invite participation and contributions. The scope of this workshop includes, but not limited to, the following topics:

Efficient Architectures Proposing alternative architectures that are more efficient than Transformers (in terms of computational complexity, memory footprint, handling longer sequence lengths ) or modifying Transformer architectures to make them more efficient

Linear and sub-quadratic Transformers , sparse attention Transformers
New architures for LLMs and foundation models and their scalability
Evaluation and benchmarking of new architectures (fair comparison of different models)
Long sequence modeling
Dense vs. sparse architectures (MoEs)

Efficient Training How can we reduce the cost of pre-training or fine-tuning new models?

More efficient pre-training solutions, from better initialization and hyper-parameter tuning to better optimization which lowers the cost of pre-training
Parameter efficient fine-tuning (PEFT) solutions for large pre-trained models
Efficient instruction tuning, prompt engineering and in-context learning
Hardware-aware solutions (e.g. better CUDA kernels), memory read/write aware solutions
Data-efficient training, reducing the requirement for labeled data, data compression and distillation

Efficient Inference How can we reduce the cost of inference for LLMs and foundation models?

Improved speculative sampling for LLMs, self-speculative sampling, selecting among multiple drafts, one draft model for different heterogeneous target models
Neural model compression techniques such as quantization, pruning, and knowledge distillation
Improved KV-caching solutions for Transformers
Distributed inference of large pre-trained models
Serving many target devices with one model, many-in-one models, early exiting, elastic networks

Evaluation and Benchmarking of Efficient Models Introducing new efficient solutions underscores the need for comprehensive benchmarks to accurately evaluate their efficacy and performance.

Datasets, benchmarks, leaderboards for evaluating efficient models
Benchmarking the performance of efficient models from different perspectives such as reasoning, hallucination, understanding, and generation quality
Benchmarking efficiency of models in terms of their memory footprint, training time, inference time, different target hardware devices and inference platforms (e.g. GPU vs. CPU)

Efficient Solutions in other Modalities and Applications

Efficiency of foundational or pre-trained models in multi-modal set-up and other modalities (beyond NLP and Speech) such as biology, chemistry, computer vision, and time series
Efficient representations (e.g. Matryoshka representation) and models in dense retrieval and search
Efficient Federated learning, lower communication costs, tackling heterogeneous data and models
Efficient graph and LLM joint learning

Submission Instructions

You are invited to submit your papers in our CMT submission portal (Link). All the submitted papers have to be anonymous for double-blind review. We expect each paper will be reviewed by at least three reviewers. The content of the paper (excluding the references and supplementary materials) should not be more than 8 pages for Long Papers and 4 pages for Short Papers, strictly following the NeurIPS template style (Link). Please be advised that the NeurIPS submission checklist is not needed for our workshop submissions.
Authors can submit up to 100 MB of supplementary materials separately. Authors are highly encouraged to submit their codes for reproducibility purposes. According to the guideline of the NeurIPS workshops, already published papers are not encouraged for submission, but you are allowed to submit your ArXiv papers or the ones which are under submission (for example any NeurIPS submissions can be submitted concurrently to workshops ). Moreover, a work that is presented at the main NeurIPS conference should not appear in a workshop. Please make sure to indicate the complete list of conflict of interests for all the authors of your paper. To encourage higher quality submissions, our sponsors are offering the Best Paper and the Best Poster Awards to qualified outstanding original oral and poster presentations (upon nomination of the reviewers). Bear in mind that our workshop is not archival, but the accepted papers will be hosted on the workshop website. Moreover, we are currently negotiating with a publisher to host opt-in accepted papers in a special issue proceeding for our workshop.

Important Dates:

Special NeurIPS Fast Track Submission Deadline: September 30, 2024 Anywhere on Earth (AOE)
~~Submission Deadline: September 15, 2024 Anywhere on Earth (AOE)~~
Acceptance Notification: October 09, 2024 AOE
Camera-Ready Submission: October 25, 2024 AOE
Workshop Date: December 14, 2024

Keynote Speakers

Danqi Chen

Princeton

Bhavana Dalvi

Allen Institute for AI

Weizhu Chen

Microsoft

Tri Dao

Princeton/Together AI

Hananeh Hajishirzi

University of Washington

Navdeep Jaitly

Apple

Lili Mou

University of Alberta

Panelists

Marjan Ghazvini Nejad
Meta

Joel Hestness
Cerebras

Navdeep Jaitly
Apple

Katie Derthick
Microsoft

Schedule

Time	Title	Presenter
8:00AM - 8:15AM	Breakfast
8:15AM - 8:30AM	Opening Remarks
8:30AM - 9:00AM	(KeyNote Talk) Efficiency through Learning from Experience	Dr. Bhavana Dalvi Mishra
9:00AM - 9:30AM	(KeyNote Talk) Multi-Teacher Distillation: An Ensemble-Then-Distill Approach	Prof. Lili Mou
9:30AM - 10:00AM	Morning Break
10:00AM - 10:30AM	(KeyNote Talk) Hardware-aware Algorithms for Language Modeling	Prof. Tri Dao
10:30AM - 11:00AM	(KeyNote Talk) Speech generative modeling with little tokenization	Dr. Navdeep Jaitly
11:00AM - 11:30AM	(KeyNote Talk) Optimizing Data Use for Efficient Pre-training	Prof. Danqi Chen
11:30AM - 11:36AM	(Spotlight 1) Sparsified State-Space Models are Efficient Highway Networks	Woomin Song
11:36AM - 11:42AM	(Spotlight 2) Longhorn: State Space Models are Amortized Online Learners	Bo Liu
11:42AM - 11:48AM	(Spotlight 3) GEAR: An Efficient Error Reduction Framework for KV Cache Compression in LLM Inference	Hao Kang
11:48AM - 11:54AM	(Spotlight 4) An Evolved Universal Transformer Memory	Edoardo Cetin
11:54AM - 12:00PM	(Spotlight 5) OLMoE: Open Mixture-of-Experts Language Models	Luca Soldaini
12:00PM - 1:30PM	Lunch Break
12:30PM - 1:30PM	Poster Session I-(Paper IDs #1 - #50 [Link to Posters])
1:30PM - 1:36PM	(Spotlight 6) RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval	Huiqiang Jiang
1:36PM - 1:42PM	(Spotlight 7) Post-Training Statistical Calibration for Higher Activation Sparsity	Vui Seng Chua
1:42PM - 1:48PM	(Spotlight 8) Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences	Niklas Schmidinger
1:48PM - 1:54PM	(Spotlight 9) Inference-Friendly Models With MixAttention	Shashank Rajput
1:54PM - 2:00PM	(Spotlight 10) One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation	Fabian Paischer
2:00PM - 2:30PM	(KeyNote Talk) The LoRA Journey and Learnings: from Creation to Industrial-Scale Adoption	Dr. Weizhu Chen
2:30PM - 3:00PM	(KeyNote Talk) How to build fully open language models: from pre-training to post-training	Prof. Hananeh Hajishirzi
3:00PM - 3:30PM	Afternoon Break
3:30PM - 4:20PM	Interactive Panel Discussion	Marjan Ghazvini Nejad Joel Hestness Navdeep Jaitly Katie Derthick
4:20PM-4:30PM	Best Paper Awards and Closing Remarks
4:30PM - 5:30PM	Poster Session II-(Paper IDs #51 - #105 [Link to Posters])

Organizers

Mehdi Rezagholizadeh
Huawei Noah's Ark Lab

Peyman Passban
Sanofi

Yu Cheng
Chinese University of Hong Kong

Soheila Samiee
BASF

Yue Dong
University of California, Riverside

Vahid Partovi Nia
Ecole Polytechnique Montreal & Huawei

Qun Liu
Huawei Noah's Ark Lab

Boxing Chen
Huawei Noah's Ark Lab

Volunteers

David Alfonso-Hermelo
Huawei Noah's Ark Lab

Khalil Bibi
Haven Studios

Mahsa Ghazvini Nejad
Huawei Noah's Ark Lab

Ali Edalati
Huawei Noah's Ark Lab

Technical Committee

Dasgupta Sabyasachi (Sanofi)
Dan Alistarh (ISTA)
Vahid Partovi Nia (Ecole Polytechnique Montreal & Huawei)
Tanya Roosta (Amazon)
Peyman Passban (Sanofi)
Ehsaneddin Asgari (QCRI)
Hamidreza Saghir (Microsoft)
Yue Dong (University of California, Riverside)
Ruijiang Li (Sanofi)
Abbas Ghaddar (Huawei Noah's Ark Lab)
Alireza Ghaffari (McGill University)
Yu Cheng (Chinese University of Hong Kong)
Jahangir Alam (CRIM-Montreal)
Hamidreza Mahyar (McMaster University)
Yufei Cui (Huawei Noah's Ark Lab)
Mahdi Biparva (Huawei Noah's Ark Lab)
Soheila Samiee (BASF)
Walid Ahmed (Huawei Technologies Canada)
Ehsan Kamalloo (Service Now Research)
Anderson Avila (INRS-EMT)
Abbas Rahimi (IBM)
David Alfonso Hermelo (Huawei Noah's Ark Lab)
Makesh Narsimhan Sreedhar (NVIDIA)
Ahmad Rashid (University of Waterloo & Vector Institute)
Suyuchen Wang (Universite de Montreal & Mila)
Tianyu Jiang (University of Cincinnati)
Peilin Yu (Brown University)
Khalil Bibi
Aysegul Bumin (Amazon)
Abderrahim Fathan (CRIM- Montreal)
Aref Jafari (University of Waterloo)
Dan Fu (Stanford University)
Anusha Sabbineni (Amazon)
Parsa Omidi (Huawei Technologies Canada)
Young Jin Kim (Microsoft)
Giovanni Monea (EPFL)
Mofetoluwa Adeyemi (University of Waterloo)
Xindi Wang (University of Western Ontario)

Alessio Brutti (Fondazione Bruno Kessler)
Saleh Ashkboos (ETH Zurich)
Parsa Kavehzadeh (Huawei Noah's Ark Lab)
Hossein Rajabzadeh (University of Waterloo)
Mohammadreza Tayaranian (McGill University)
Varun Gangal (ASAPP Inc.)
Sebastian Jaszczur (IDEAS NCBR, University of Warsaw)
Ali Edalati (Huawei Noah's Ark Lab)
Mojtaba Valipour (University of Waterloo)
Heitor Guimarães (INRS University)
Jing Li (Mitsubishi Electric Research Laboratories)
Mohammad Ruhul Amin (Fordham University)
Mohammad Dehghan (Autodesk)
Raffy Fahim (Microsoft)
Feiyang Kang (Virginia Tech University)
Ning Shi (University of Alberta)
Daria Soboleva (Cerebras Systems)
Qingru Zhang (Georgia Institute of Technology)
Lilly Kumari (University of Washington)
Thomas Ortner (IBM Research Zurich - Europe)
Dominik Wagner (Technische Hochschule Nuernberg)
Benyamin Jamialahmadi (University of Waterloo)
Tianshu Zhu (Huawei Noah's Ark Lab)
Haoran Zhao (Drexel University & University of Washington)
Satya Sai Srinath Namburi (Amazon)
Mouloud Belbahri (Layer 6 AI)
Abhishek Panigrahi (Princeton University)
Arthur Pimentel (INRS)
Mahsa Salmani (Huawei Technologies Canada)
Mohammad Ali Alomrani (Huawei Noah's Ark Lab)
Abdul Hameed Azeemi (Lahore University)
Mohammadreza Pourreza (Google Research)
Yunan Zhang (Microsoft)
MohammadAli SadraeiJavaheri (Sharif University)
Omid Ghahroodi (Sharif University)
Adam Lee (UC Bereley)

Overview

Call for Papers

Submission Instructions

Important Dates:

Keynote Speakers

Danqi Chen

Princeton

Bhavana Dalvi

Allen Institute for AI

Weizhu Chen

Microsoft

Tri Dao

Princeton/Together AI

Hananeh Hajishirzi

University of Washington

Navdeep Jaitly

Apple

Lili Mou

University of Alberta

Panelists

Marjan Ghazvini Nejad Meta

Joel Hestness Cerebras

Navdeep Jaitly Apple

Katie Derthick Microsoft

Schedule

Organizers

Mehdi Rezagholizadeh Huawei Noah's Ark Lab

Peyman Passban Sanofi

Yu Cheng Chinese University of Hong Kong

Soheila Samiee BASF

Yue Dong University of California, Riverside

Vahid Partovi Nia Ecole Polytechnique Montreal & Huawei

Qun Liu Huawei Noah's Ark Lab

Boxing Chen Huawei Noah's Ark Lab