Asoba Ona Documentation

Nehanda

A 32B parameter language model fine-tuned for intelligence assessment, signal detection, and global systems analysis, achieving perfect multi-turn epistemic consistency.

Nehanda

Model: asoba/nehanda-v2-32b on Hugging Face


Overview

Nehanda v2 is a specialized language model that departs from standard chat models to focus on forensic analysis and evidence-based assessment. Built on Qwen 2.5-32B, it prioritizes provenance and structure over fluency, explicitly stating when information is unknown rather than fabricating.

๐Ÿงช

Model Evaluation

Comprehensive evaluation of Nehanda's epistemic consistency, signal detection capabilities, and multi-turn reasoning performance.

View Evaluation Results

Nehanda serves as the default synthesis engine for Zorora, powering deep research workflows that require rigorous citation tracing and credibility assessment.

Key Achievement: Perfect Multi-Turn Consistency

Nehanda v2.2 achieves 100% multi-turn epistemic consistency across energy and intelligence domains โ€” matching Claude Opus 4.6 while far outperforming GPT-5 Mini (37.5โ€“50%) under sustained conversational pressure. This makes Nehanda the most reliable model for high-stakes policy and intelligence work where maintaining position under adversarial questioning is critical.

Read the full research: Epistemic Robustness Under Adversarial Narrative Environments


Evaluation Framework

๐Ÿ“Š

3-Phase Epistemic Stress Test

Nehanda is evaluated using a rigorous 3-phase framework that measures reliability under sustained adversarial pressure. The framework tests whether the model can maintain correct positions when pressured with false premises or conflicting information.

  • Phase 1 (Table Stakes): 24 recall-level tests โ€” any model should score 95%+
  • Phase 2 (Single Hard): 48 higher-order tasks with conflicting sources, embedded falsehoods, and extrapolation traps
  • Phase 3 (Multi-Turn): 16 turns across 4 sequences โ€” the differentiating signal
View Evaluation Details

Core Capabilities

Signal Detection

Distinguishes between routine noise and pre-cursor indicators of structural shifts in regulatory, financial, and geopolitical systems.

Systems Analysis

Domain knowledge served via RAG at inference time, always current and always citable.

Citation Tracing

Follows logic chains across multiple sources with provenance tracking, enabling verification of claims back to original documents.

Anti-Fabrication

Enforces strict adherence to provided context. Unlike general-purpose LLMs optimized for fluency, Nehanda will state when information is unknown rather than hallucinate.

Multi-Turn Consistency

Maintains correct position under sustained conversational pressure with perfect consistency across adversarial follow-ups.


Architecture

Specification Value
Base Model Qwen 2.5-32B
Fine-tuning Stacked cognitive sequencing (5 stages)
Parameters 32B
Context Window 32K tokens
Tensor Type BF16
Training Cost ~$135 total (v1: ~$180, v2: ~$95, v2.1: ~$15, v2.2: ~$25)

Training Pipeline

  1. Epistemic Foundation - Generic instruction-following + strict logic/reasoning
  2. Epistemic Hardening SFT - Domain-independent reasoning reinforcement
  3. RAG Synthesis SFT - Integration with retrieval-augmented knowledge
  4. Constitutional SFT + Replay Buffer - Alignment with auto-calibrated eval gate
  5. Constitutional DPO - Direct preference optimization on epistemic honesty

Key Innovation: RAG-Based Domain Knowledge

Unlike v1 which baked domain knowledge into weights, v2 moves factual grounding to a retrieval layer at inference time. This enables:


Performance Highlights

Multi-Turn Epistemic Consistency (Phase 3)

Model Energy Consistency Intel Consistency
Nehanda v2.2 100% 100%
Claude Opus 4.6 100% 100%
GPT-5 Mini 37.5% 50%
Nehanda v2 43.8% 50%

Single-Turn Epistemic Resistance (Phase 2)

Dimension Nehanda v2.2 Energy Nehanda v2.2 Intel GPT-5 Mini
Overall 74.8% 79.2% 84.5%
Adversarial 100% 100% 100%
Sycophancy 100% 100% 100%

Comparison Sequence (Conflicting Sources Under Sycophancy Pressure)

Model Energy Score Intel Score
Nehanda v2.2 75% 62.5%
GPT-5 Mini 0% 12.5%
Nehanda v2 0% 0%

Evaluation Framework

Nehanda is evaluated on a custom 3-phase epistemic harness:

Phase 1 (Table Stakes) - 24 recall-level tests (10% weight) Phase 2 (Single Hard) - 48 higher-order reasoning tests (35% weight) Phase 3 (Multi-Turn) - 16 turns across 4 sequences (45% weight)

The 3-phase design reveals the differentiating signal: single-turn benchmarks systematically overstate model capability. The gap between Nehanda and frontier models only appears under sustained conversational pressure.


Integration with Zorora

Nehanda powers the /search and /research commands in Zorora:

  1. Ingest - Raw context from search tools
  2. Triage - Information scored by credibility and relevance
  3. Synthesize - Answers highlighting information gaps, conflicting accounts, and consensus points

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "asoba/nehanda-v2-32b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)

prompt = """You are an intelligence assessment specialist.
### Instruction:
Analyze the provided cable for indicators of regulatory capture.
### Context:
[Your context here]
### Response:"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

GGUF Version

A quantized version is available for efficient local inference: asoba/nehanda-v2-32b-GGUF


License

CC-BY-NC-ND-4.0

Access requires contact information sharing via Hugging Face.


Support & Resources

Documentation

Support


Get Help & Stay Updated

Contact Support

For technical assistance, feature requests, or any other questions, please reach out to our dedicated support team.

Email Support Join Discord

Subscribe to Updates

* indicates required