v2.4.6
January 28, 2026
Breaking change: website crawling uses per-page content hashes
We changed the WebsiteReader deduplication model to compute content hashes per page. This aligns skip_if_exists with page-level updates and ensures accurate re-crawls.
Details
- Behavior change: Deduplication occurs at page granularity, not aggregate level
- Action required: Clear existing website crawl entries before re-indexing to prevent duplicates
- Benefits: Higher correctness, predictable re-crawls, and lower operational overhead
Who this is for: Engineering teams managing recurring website crawls and large content refreshes.
