v2.4.6

January 28, 2026

Breaking change: website crawling uses per-page content hashes

We changed the WebsiteReader deduplication model to compute content hashes per page. This aligns skip_if_exists with page-level updates and ensures accurate re-crawls.

Details

  • Behavior change: Deduplication occurs at page granularity, not aggregate level
  • Action required: Clear existing website crawl entries before re-indexing to prevent duplicates
  • Benefits: Higher correctness, predictable re-crawls, and lower operational overhead

Who this is for: Engineering teams managing recurring website crawls and large content refreshes.