v2.4.6

January 28, 2026

Reliable website deduplication with per-page content hashing

WebsiteReader now computes a unique content hash per crawled URL, fixing skip_if_exists for multi-page crawls. This ensures accurate per-page deduplication, reduces redundant ingestion, and saves processing cost during re-crawls.

Details

  • Correct per-page deduplication for predictable skip_if_exists behavior
  • Fewer unnecessary writes and tokens when re-indexing multi-page sites
  • Action required: Clear existing website crawl entries in your knowledge store before re-indexing to avoid duplicates

Who this is for: Teams maintaining search indexes, documentation portals, or knowledge bases sourced from websites.