Norwegian Language Models

Borealis is a series of open language models from the National Library of Norway (Nasjonalbiblioteket) aimed at strengthening Norwegian and Sami in AI. The models are a continued-training of Google's Gemma models on Norway's digital cultural heritage, released in several sizes under an open license, and can be fine-tuned and run on an organization's own infrastructure. They are a concrete example of the sovereign / national open-model strategy: small, specialized models that complement — rather than replace — the large commercial frontier models from OpenAI, Anthropic, Google, and Meta.

Small specialized models match large ones on bounded tasks

On narrow, well-defined language tasks where training data can be tailored, small models often perform on par with or better than large ones. The cited use cases are text classification and categorization, named-entity recognition for Norwegian names, sentiment analysis tuned to Norwegian usage, summarization and key-information extraction, and translation — especially between Bokmål, Nynorsk, and eventually Sami languages. Fine-tuning a Borealis model on domain-specific text (legal, medical, public administration, finance, technical documentation) can make it very strong within that domain.

Local execution enables digital sovereignty and privacy

A central argument for national open models is that data need not leave the organization's own infrastructure or Norwegian jurisdiction. Running small models locally supports healthcare (patient data without foreign cloud), defense and security (classified material), the justice system (case documents under strict privacy requirements), public administration (Schrems II, GDPR, and national security compliance), and critical infrastructure (models that work without internet access). This is the practical mechanism behind "sovereign AI": control and locality rather than dependence on foreign hyperscalers.

Smaller models are cheaper, faster, and lower-energy

Small models carry substantially lower operating cost and latency, making them suited to high-volume document processing (tagging and metadata over millions of documents), real-time Norwegian chatbots, embedding in products where calls to large models are too slow or costly, and batch processing of archives. They also have a much smaller energy footprint — a 7B model uses a fraction of the energy of a several-hundred-billion-parameter model — and can run on energy-efficient hardware, which the National Library frames as a more responsible choice when a frontier model is not required, particularly on Norway's renewable-energy infrastructure.

They serve as components inside larger AI systems

Rather than standing alone, small Norwegian models fill roles inside multi-model pipelines: query rewriting, reranking, and answer validation in RAG; request routing, tool selection, and simple reasoning in agent systems; quality-checking and filtering output from larger models; and ensuring generated text is good, clear Norwegian in both Nynorsk and Bokmål. The smallest models can also run at the edge — on laptops, phones, field/emergency equipment without stable connectivity, and embedded IoT/industrial systems.

Open weights unlock cultural-heritage, research, and education use

For the National Library itself, Borealis enables improved search and retrieval in digitized collections, OCR post-processing and transcription of historical Norwegian text, automatic metadata generation, and large-corpus analysis of language change over time. Open weights and documented training data also give academia inspectable, reproducible models students and researchers can modify and run without cloud costs — a distinct advantage of a publicly-funded open model over closed commercial APIs.