The fourth version of the Efficient Natural Language and Speech Processing (ENLSP-IV) workshop will focus on how to make large language and foundation models more efficient in terms of Architecture, Training, and Inference in their real-world applications. This year, following the trend of industry and academia, we put more emphasis on investigating new architectures to make future language and foundation models more efficient. Moreover, we highlight the importance of comprehensive evaluation and benchmarking new efficient models from different practical aspects. The workshop program offers an interactive platform for gathering experts and talents from academia and industry through invited talks, panel discussion, paper submission, reviews, interactive poster sessions, oral presentations and a couple of mentorship sessions for new researchers. This will be a unique opportunity to discuss and share challenging problems, build connections, exchange ideas and brainstorm, and foster future collaborations. The topics of this workshop can be of interest for people working on general machine learning, deep learning, hardware, optimization, theory and applications.

Overview

As large language models (e.g. GPT-3, GPT-4, Llama 3, PALM, Gemini, and Pangu-∑), pre-trained speech models (e.g. wav2vec, Hubert, wavLM, Whisper, Conformer-1 and Conformer-2 ) and other foundation models (e.g. GPT-4o, and Stable Diffusion) have advanced rapidly and become more prominent and widespread, improving their efficiency would be more crucial. While it is true that the computational power and GPU resources have played a significant role in the success of these models, we need to also be aware that using more computational resources can result in: (a) increasing the cost of training and deploying such models, (b) making the models less accessible, (c) less contribution from the research community, and (d) increasing the environmental costs of the models. Moreover, it is evident that most of these pre-trained models are largely over-parameterized and their efficiency is under question. Lack of efficiency can largely limit the application of these advanced models in practice.

Building upon the framework of our previous three editions, this workshop remains dedicated to investigating solutions for enhancing the efficiency of pre-trained language and foundation models but with introducing some fresh and important new topics to the community and encouraging their contributions. Just to highlight a few: (1) Despite the ubiquitous usage of Transformers, they suffer from quadratic computational complexity which limits their efficiency especially for longer sequence lengths. Should we improve the efficiency of Transformers (e.g. in Hedgehog, Gated Linear Attention) or look for other architectures (e.g. Mamba, Jamba, RVKW, xLSTM, and SSMs)? (2) For accelerating training, we have seen the significant impact of designing hardware efficient implementations such as in Flash Attention. Should we focus more on these hardware-aware solutions or more on new/improved architectures? (3) For efficient inference, there are solutions such as: Speculative Decoding [Link1] [Link2] where the performance is strongly model and task-dependent and the draft and target models should have the same vocabulary (tokenizer); improved KV-caching (e.g. [Link]) which has a limited speed-up; and many-in-one models such as SortedNet, MatFormer, and LayerSkip but the performance of sub-models drops compared to their corresponding individual models. (4) While there are many so-called efficient solutions in the literature, there is no fair, comprehensive and practical evaluation of these models and their comparison to each other. For example, we do not know the hallucination extent of the new architectures vs. the transformer model (e.g. in [Link]).

Call for Papers

Investing in the future of language and foundation models requires a concrete effort to enhance their efficiency across multiple dimensions (including architecture, training, and inference) and having a comprehensive evaluation framework. To encourage engagement from the NeurIPS community, we present several active research topics in this field that invite participation and contributions. The scope of this workshop includes, but not limited to, the following topics:

Efficient Architectures Proposing alternative architectures that are more efficient than Transformers (in terms of computational complexity, memory footprint, handling longer sequence lengths ) or modifying Transformer architectures to make them more efficient

  • Linear and sub-quadratic Transformers , sparse attention Transformers
  • New architures for LLMs and foundation models and their scalability
  • Evaluation and benchmarking of new architectures (fair comparison of different models)
  • Long sequence modeling
  • Dense vs. sparse architectures (MoEs)
Efficient Training How can we reduce the cost of pre-training or fine-tuning new models?
  • More efficient pre-training solutions, from better initialization and hyper-parameter tuning to better optimization which lowers the cost of pre-training
  • Parameter efficient fine-tuning (PEFT) solutions for large pre-trained models
  • Efficient instruction tuning, prompt engineering and in-context learning
  • Hardware-aware solutions (e.g. better CUDA kernels), memory read/write aware solutions
  • Data-efficient training, reducing the requirement for labeled data, data compression and distillation
Efficient Inference How can we reduce the cost of inference for LLMs and foundation models?
  • Improved speculative sampling for LLMs, self-speculative sampling, selecting among multiple drafts, one draft model for different heterogeneous target models
  • Neural model compression techniques such as quantization, pruning, and knowledge distillation
  • Improved KV-caching solutions for Transformers
  • Distributed inference of large pre-trained models
  • Serving many target devices with one model, many-in-one models, early exiting, elastic networks
Evaluation and Benchmarking of Efficient Models Introducing new efficient solutions underscores the need for comprehensive benchmarks to accurately evaluate their efficacy and performance.
  • Datasets, benchmarks, leaderboards for evaluating efficient models
  • Benchmarking the performance of efficient models from different perspectives such as reasoning, hallucination, understanding, and generation quality
  • Benchmarking efficiency of models in terms of their memory footprint, training time, inference time, different target hardware devices and inference platforms (e.g. GPU vs. CPU)
Efficient Solutions in other Modalities and Applications
  • Efficiency of foundational or pre-trained models in multi-modal set-up and other modalities (beyond NLP and Speech) such as biology, chemistry, computer vision, and time series
  • Efficient representations (e.g. Matryoshka representation) and models in dense retrieval and search
  • Efficient Federated learning, lower communication costs, tackling heterogeneous data and models
  • Efficient graph and LLM joint learning

Submission Instructions

You are invited to submit your papers in our CMT submission portal (Link). All the submitted papers have to be anonymous for double-blind review. We expect each paper will be reviewed by at least three reviewers. The content of the paper (excluding the references and supplementary materials) should not be more than 8 pages for Long Papers and 4 pages for Short Papers, strictly following the NeurIPS template style (Link).
Authors can submit up to 100 MB of supplementary materials separately. Authors are highly encouraged to submit their codes for reproducibility purposes. According to the guideline of the NeurIPS workshops, already published papers are not encouraged for submission, but you are allowed to submit your ArXiv papers or the ones which are under submission (for example any NeurIPS submissions can be submitted concurrently to workshops ). Moreover, a work that is presented at the main NeurIPS conference should not appear in a workshop. Please make sure to indicate the complete list of conflict of interests for all the authors of your paper. To encourage higher quality submissions, our sponsors are offering the Best Paper and the Best Poster Awards to qualified outstanding original oral and poster presentations (upon nomination of the reviewers). Bear in mind that our workshop is not archival, but the accepted papers will be hosted on the workshop website. Moreover, we are currently negotiating with a publisher to host opt-in accepted papers in a special issue proceeding for our workshop.

Important Dates:

  • Submission Deadline: September 15, 2024 Anywhere on Earth (AOE)
  • Acceptance Notification: October 14, 2024 AOE
  • Camera-Ready Submission: October 28, 2024 AOE
  • Workshop Date: TBD

Confirmed Keynote Speakers

Danqi Chen
Danqi Chen
Princeton
Peter Clark
Peter Clark
Allen Institute for AI
Weizhu Chen
Weizhu Chen
Microsoft
Tri Dao
Tri Dao
Princeton/Together AI
Hananeh Hajishirzi
Hananeh Hajishirzi
University of Washington
Navdeep Jaitly
Navdeep Jaitly
Apple
Maciej Besta
Maciej Besta
ETH Zurich
Lili Mou
Lili Mou
University of Alberta

Confirmed Panelists

Marjan Ghazvini Nejad
Marjan Ghazvini Nejad
Meta
Lu Hou
Lu Hou
Huawei
Joel Hestness
Joel Hestness
Cerebras
Katie Derthick
Katie Derthick
Microsoft

Tentative Schedule

Time Title Presenter
8:10AM - 8:15AM Opening Speech
8:15AM - 8:45AM (KeyNote Talk) Title TBD
Prof. Maciej Besta
8:45AM - 9:15AM (KeyNote Talk) Title TBD
Dr. Peter Clark
9:15AM - 10:00AM Accepted Oral Presentations
TBD
10:00AM - 10:30AM Morning Break
10:30AM - 11:00AM (KeyNote Talk) Title TBD
Prof. Christopher Re
11:00AM - 11:30AM (KeyNote Talk) Title TBD
Dr. Navdeep Jaitly
11:30AM - 12:00AM (KeyNote Talk) Title TBD
Prof. Danqi Chen
12:00PM - 12:30PM Accepted Oral Presentations
TBD
12:30PM - 1:15PM Lunch Break
1:15PM - 2:00PM Poster Session I & Free Discussion
2:00PM - 2:30PM (KeyNote Talk) Title TBD
Dr. Weizhu Chen
2:30PM - 3:00PM (KeyNote Talk) Title TBD
Prof. Hananeh Hajishirzi
03:00PM - 03:15PM Afternoon Break
3:20PM - 4:10PM Interactive Panel Discussion
  • Dr. Marjan Ghazvini Nejad
  • Dr. Joel Hestness
  • Dr. Lu Hou
4:10PM-4:15PM Best Paper and Poster Awards
4:15PM - 5:00PM Poster Session II & Free Discussion

Organizers

Mehdi Rezagholizadeh
Mehdi Rezagholizadeh
Huawei Noah's Ark Lab

 
Peyman Passban
Peyman Passban
Sanofi

 
Yu Cheng
Yu Cheng
Chinese University of Hong Kong

 
Soheila Samiee
Soheila Samiee
BASF

 
Yue Dong
Yue Dong
University of California, Riverside

 
Vahid Partovi Nia
Vahid Partovi Nia
Ecole Polytechnique Montreal & Huawei

 
Qun Liu
Qun Liu
Huawei Noah's Ark Lab

 
Boxing Chen
Boxing Chen
Huawei Noah's Ark Lab

 

Volunteers

David Alfonso-Hermelo
David Alfonso-Hermelo
Huawei Noah's Ark Lab

 
Khalil Bibi
Khalil Bibi
Haven Studios

 
Mahsa Ghazvini Nejad
Mahsa Ghazvini Nejad
Huawei Noah's Ark Lab

 
Ali Edalati
Ali Edalati
Huawei Noah's Ark Lab

 


Confirmed Technical Committee

  • Dasgupta Sabyasachi (Sanofi)
  • Dan Alistarh (ISTA)
  • Vahid Partovi Nia (Ecole Polytechnique Montreal & Huawei)
  • Tanya Roosta (Amazon)
  • Peyman Passban (Sanofi)
  • Ehsaneddin Asgari (QCRI)
  • Hamidreza Saghir (Microsoft)
  • Yue Dong (University of California, Riverside)
  • Ruijiang Li (Sanofi)
  • Abbas Ghaddar (Huawei Noah's Ark Lab)
  • Alireza Ghaffari (McGill University)
  • Yu Cheng (Chinese University of Hong Kong)
  • Jahangir Alam (CRIM-Montreal)
  • Hamidreza Mahyar (McMaster University)
  • Yufei Cui (Huawei Noah's Ark Lab)
  • Mahdi Biparva (Huawei Noah's Ark Lab)
  • Soheila Samiee (BASF)
  • Walid Ahmed (Huawei Technologies Canada)
  • Ehsan Kamalloo (Service Now Research)
  • Anderson Avila (INRS-EMT)
  • Abbas Rahimi (IBM)
  • David Alfonso Hermelo (Huawei Noah's Ark Lab)
  • Makesh Narsimhan Sreedhar (NVIDIA)
  • Ahmad Rashid (University of Waterloo & Vector Institute)
  • Suyuchen Wang (Universite de Montreal & Mila)
  • Tianyu Jiang (University of Cincinnati)
  • Peilin Yu (Brown University)
  • Khalil Bibi
  • Aysegul Bumin (Amazon)
  • Abderrahim Fathan (CRIM- Montreal)
  • Aref Jafari (University of Waterloo)
  • Dan Fu (Stanford University)
  • Anusha Sabbineni (Amazon)
  • Parsa Omidi (Huawei Technologies Canada)
  • Young Jin Kim (Microsoft)
  • Giovanni Monea (EPFL)
  • Mofetoluwa Adeyemi (University of Waterloo)
  • Xindi Wang (University of Western Ontario)
  • Alessio Brutti (Fondazione Bruno Kessler)
  • Saleh Ashkboos (ETH Zurich)
  • Parsa Kavehzadeh (Huawei Noah's Ark Lab)
  • Hossein Rajabzadeh (University of Waterloo)
  • Mohammadreza Tayaranian (McGill University)
  • Varun Gangal (ASAPP Inc.)
  • Sebastian Jaszczur (IDEAS NCBR, University of Warsaw)
  • Ali Edalati (Huawei Noah's Ark Lab)
  • Mojtaba Valipour (University of Waterloo)
  • Heitor Guimarães (INRS University)
  • Jing Li (Mitsubishi Electric Research Laboratories)
  • Mohammad Ruhul Amin (Fordham University)
  • Mohammad Dehghan (Autodesk)
  • Raffy Fahim (Microsoft)
  • Feiyang Kang (Virginia Tech University)
  • Ning Shi (University of Alberta)
  • Daria Soboleva (Cerebras Systems)
  • Qingru Zhang (Georgia Institute of Technology)
  • Lilly Kumari (University of Washington)
  • Thomas Ortner (IBM Research Zurich - Europe)
  • Dominik Wagner (Technische Hochschule Nuernberg)
  • Benyamin Jamialahmadi (University of Waterloo)
  • Tianshu Zhu (Huawei Noah's Ark Lab)
  • Haoran Zhao (Drexel University & University of Washington)
  • Satya Sai Srinath Namburi (Amazon)
  • Mouloud Belbahri (Layer 6 AI)
  • Abhishek Panigrahi (Princeton University)
  • Arthur Pimentel (INRS)
  • Mahsa Salmani (Huawei Technologies Canada)
  • Mohammad Ali Alomrani (Huawei Noah's Ark Lab)
  • Abdul Hameed Azeemi (Lahore University)
  • Mohammadreza Pourreza (Google Research)
  • Yunan Zhang (Microsoft)
  • MohammadAli SadraeiJavaheri (Sharif University)
  • Omid Ghahroodi (Sharif University)
  • Adam Lee (UC Bereley)


Diamond Sponsors



Platinum Sponsor

Gold Sponsors