Search results
README.md. bitnet.cpp is the official inference framework for 1-bit LLMs (e.g., BitNet b1.58). It offers a suite of optimized kernels, that support fast and lossless inference of 1.58-bit models on CPU (with NPU and GPU support coming next).
In this work, we introduce BitNet, a scalable and stable 1-bit Transformer architecture designed for large language models. Specifically, we introduce BitLinear as a drop-in replacement of the nn.Linear layer in order to train 1-bit weights from scratch.
Key Features of BitNet.cpp. BitNet.cpp comes with a treasure trove 🪙 of features designed to optimize performance and usability: Optimized Performance 🚀: BitNet.cpp is fine-tuned to run seamlessly on both ARM and x86 CPUs — commonly found in PCs and mobile devices. Performance gains are impressive 🔥; on ARM CPUs, speed increases range from 1.37x to 5.07x, and on x86 CPUs, up to 6.17x.
27 lut 2024 · Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}.
22 paź 2024 · Recent advances in 1-bit Large Language Models (LLMs), such as BitNet and BitNet b1.58, present a promising approach to enhancing the efficiency of LLMs in terms of speed and energy consumption. These developments also enable local LLM deployment across a broad range of devices. In this work, we introduce bitnet.cpp, a tailored software stack designed to unlock the full potential of 1-bit LLMs ...
29 lut 2024 · BitNet b1.58 retains all the benefits of the original 1-bit BitNet. Furthermore, BitNet b1.58 offers two additional advantages. (1) its modeling capability is stronger due to its explicit support for feature filtering, made possible by the inclusion of 0 in the model weights, which can significantly improve the performance of 1-bit LLMs.