Sitemap URLs and HTTP 200
An XML sitemap tells search engines which URLs you care about. For those listed URLs, it is good hygiene that each URL responds with HTTP 200 on the first HTTP response—without relying on redirects (3xx), client errors (4xx), or server errors (5xx). That keeps crawl budgets predictable and avoids sending crawlers through unnecessary hops or dead ends.
This is separate from “soft” error pages that still return 200 with a thin or misleading body; those require manual review and are not inferred automatically here.
What SEO Perception does
We discover sitemap documents from your site’s robots.txt file using standard Sitemap: lines (case-insensitive). If none are declared, we also try a common default path (/sitemap.xml on your site’s host).
We parse urlset sitemaps for <loc> entries and follow sitemap index files to nested sitemaps, with limits on how many documents and page URLs we process per run so large or unusual setups stay bounded.
For each in-scope page URL (same crawl scope as your website, including subdomain settings where applicable), we compare against your crawled URL data when a row already exists. If a URL is listed in the sitemap but not yet in the crawl database, we may run a light HTTP check (first response only, redirects not followed) for a small number of URLs per run.
When a listed URL’s first response is not 200, you may see a possibility on that URL explaining the status we observed. Fixing the server response or updating the sitemap to match live, canonical URLs addresses the finding.