News

Apple released eight very small open-source language models

Apple recently released the OpenELM (Open Efficient Language Model) family of models on Hugging Face. Other than the models' size, a key differentiator of the OpenELM models is the approach to non-uniform parameter allocation within each transformer model layer.

by Ellie Ramirez-Camara

Updated May 01, 2024

Apple recently released the OpenELM (Open Efficient Language Model) family of models on Hugging Face. It comprises four models ranging from 270 million to 3 billion parameters, and their instruction-tuned variants. In addition to model weights and inference code, the release includes a complete training and evaluation framework that works with publicly available datasets. The framework contains training logs, multiple checkpoints, pre-training configurations, and code to convert the models to the MLX library, enabling local fine-tuning and inference on Apple devices. Although smaller, the models fulfill a similar purpose to Microsoft's Phi-3 models, launched earlier this week. Interestingly, Phi-3-mini, the smallest in the Phi-3 family, has 3.8 billion parameters, whereas the largest OpenELM model only has 3 billion.

Other than the models' size, a key differentiator of the OpenELM models is the approach to non-uniform parameter allocation within each transformer model layer. Rather than adopting a uniform approach, the research team adjusted the number of attention heads and the FFM multiplier for each layer, resulting in a non-uniform parameter allocation. The non-uniform distribution optimizes computational resources and improves the model performance while being trained on fewer tokens. The research paper reports that the 1.1 billion-parameter OpenELM variant achieves 2.36% higher accuracy than the 1.2 billion-parameter OLMo in OpenLLM Leaderboard tasks, while pre-trained with half as much data as OLMo.

The models have a 2048-token context window and were trained on a dataset comprising RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, which reportedly amount to 1.8 trillion data tokens. Although Apple recognizes that reproducibility and transparency are essential to advancing open research and can only be achieved by making the full set of source code, model weights, and training materials available, the company also cautioned that the models were being released with no safety guarantees.

by Ellie Ramirez-Camara

Updated May 01, 2024