How OLMo From AI2 Redefines LLM Innovation
The Allen Institute for AI created the Open Language Model, or OLMo, which is an open-source large language model with the aim of advancing the science of language models through open research. It marks a major milestone in the evolution of large language models.
Unlike current open large language models like Llama and Mistral, which might limit access to their training data, architectures, or evaluation methodologies, OLMo stands out by providing full access to its pre-training data, training code, model weights, and evaluation suite. This openness is aimed at empowering academics and researchers to collectively study and advance the field of language modeling.
OLMo represents a collaborative effort to advance the science of language models. The developers behind the LLM have a mission to empower academics and researchers by providing access to training code, models, and evaluation code necessary for open research.
OLMo’s architecture is built on AI2’s Dolma dataset, which features a three trillion-token open corpus. It includes full model weights for four model variants at the 7B scale, each trained to at least 2T tokens. OLMo’s innovative aspects include its training approaches, size, and the diversity of data it was trained on. Unique features that set it apart from predecessors are its open-source nature and the comprehensive release of training and evaluation tools.
OLMo’s key differentiators include:
Full Pre-training Data and Code: OLMo is built on AI2’s Dolma dataset, featuring a three trillion token open corpus that spans a diverse mix of web content, academic publications, code, books, and encyclopedic materials. This dataset is publicly available, allowing researchers to understand and leverage the exact data used for model training.
Comprehensive Framework Release: The framework includes not just the model weights but also the training code, inference code, training metrics, and logs for four model variants at the 7B scale. It even provides over 500 checkpoints per model for in-depth evaluation, all under the Apache 2.0 License.
Evaluation and Benchmarking Tools: AI2 has released Paloma, a benchmark for evaluating language models across various domains. This enables standardized performance comparisons and deeper insights into model capabilities and limitations.
In contrast to its contemporaries like Llama and Mistral, which have made significant contributions to the AI landscape through their respective advances and specializations, OLMo’s commitment to openness and transparency sets a new precedent. It promotes a collective and transparent approach to understand, improve, and ethically advance the capabilities of language models.
The development of OLMo by the AI2 is a collaborative effort involving partnerships with several organizations and institutions. AI2 has teamed up with AMD and CSC, utilizing the GPU portion of the all-AMD processor-powered LUMI pre-exascale supercomputer. This collaboration extends to the hardware and computing resources necessary for the development of OLMo.
AI2 has partnered with organizations such as Surge AI and MosaicML for data and training code. These partnerships are crucial for providing the diverse datasets and sophisticated training methodologies that underpin OLMo’s capabilities. The collaboration with the Paul G. Allen School of Computer Science and Engineering at the University of Washington and Databricks Inc. has also been pivotal in realizing the OLMo project.
It is important to note that the current architecture of OLMo is not the same as the models that power chatbots or AI assistants, which use instruction-based models. However, that’s on the roadmap. According to AI2, there will be multiple enhancements made to the model in the future. In the coming months, there are plans to iterate on OLMo by introducing different model sizes, modalities, datasets, and capabilities into the OLMo family. This iterative process is aimed at continuously improving the model’s performance and utility for the research community.
OLMo’s open and transparent approach, along with its advanced capabilities and commitment to continuous improvement, make it a major milestone in the evolution of LLMs.