Shopify Sitemap Optimization: Fixing Gaps, Pinging Search Engines, and Adding llms.txt
What Shopify's auto-generated sitemap includes and excludes, how to fix coverage gaps without third-party apps, how often to ping search engines, and how sitemap structure relates to llms.txt for AI crawler discovery.
Shopify generates your sitemap.xml automatically. Most operators assume that means it is correct. It is not — at least not for stores with more than 50 products, multiple blogs, or custom page types. This article covers what Shopify’s sitemap actually includes, where the gaps are, how to fix them, and how your sitemap relates to llms.txt — the newer AI-crawler discovery mechanism.
What Shopify’s Auto-Generated Sitemap Contains
Shopify’s sitemap lives at yourdomain.com/sitemap.xml. It is a sitemap index file pointing to four sub-sitemaps:
| Sub-sitemap | URL | Contents |
|---|---|---|
| Pages | /sitemap_pages_1.xml | All active pages (/pages/about, /pages/faq, etc.) |
| Products | /sitemap_products_1.xml | All published products |
| Collections | /sitemap_collections_1.xml | All published collections (clean URLs only — no filter variants) |
| Blogs | /sitemap_blogs_1.xml | All published blog posts across all blogs |
Each sub-sitemap includes <loc> (URL), <lastmod> (last modified date), and <image:image> entries for the first image on each product/collection page.
For a store with more than 1,000 products or blog posts, Shopify automatically creates additional numbered sub-sitemaps (sitemap_products_2.xml, sitemap_products_3.xml, etc.), paginated at 1,000 URLs each.
The 5 Sitemap Gaps Shopify Doesn’t Tell You About
Gap 1: Custom landing pages are excluded
Any page created outside of Shopify’s native /pages/ route — for example, third-party landing page builders (PageFly, GemPages, Shogun) create pages at /apps/[builder]/[slug] or injected at custom routes — is not included in Shopify’s sitemap. These pages must be submitted manually via GSC’s URL Inspection tool or by requesting a custom sitemap from the page builder app.
Gap 2: Blog sub-sitemaps only include the blog post, not the blog index
/blogs/news (your blog index page) is not included in sitemap_blogs_1.xml. Only individual posts (/blogs/news/post-slug) are listed. If your blog index is a target page for a keyword like “[brand] blog,” submit it manually via GSC.
Gap 3: Product images beyond the first are not in the sitemap
The <image:image> extension in Shopify’s sitemap includes only the first product image. Products with 6+ high-quality images are missing 5 image indexation opportunities. Image sitemaps matter more than most operators think — Google Images drives 20–30% of product discovery traffic for apparel, jewelry, and home goods categories.
Fix: Add a custom image sitemap via a Shopify app (Yoast for Shopify handles this) or manually append <image:image> entries for additional product images in a supplemental sitemap.
Gap 4: lastmod dates are often wrong
Shopify populates <lastmod> with the product or page’s updated_at timestamp from the admin API. This timestamp updates any time the product is touched in the admin — including inventory adjustments, tag changes, and app writes. A product that had its inventory decremented by 1 at 3am shows <lastmod> as 3am even though no SEO-relevant content changed.
Why it matters: Search engines use <lastmod> to prioritize recrawl frequency. Artificially inflated <lastmod> timestamps train crawlers to expect frequent content changes, then disappoint them — which can reduce crawl priority over time.
Gap 5: Deleted or 404’d products persist in sitemap cache for up to 48 hours
When you delete a product in Shopify, it takes up to 48 hours for the sitemap to regenerate without it. During that window, Googlebot may crawl the deleted product URL from the sitemap and encounter a 404. This is not catastrophic — Googlebot handles 404s gracefully — but it does burn crawl budget. For stores with rapid inventory churn (flash sales, seasonal drops), this 48-hour lag is worth knowing.
Verifying Sitemap Coverage
Before fixing gaps, measure them:
- Open
yourdomain.com/sitemap.xml - Count the product URLs: does the number match your published product count in Shopify admin?
- Count collection URLs: does it match your collection count?
- Count page URLs: are any custom pages missing?
- Submit the sitemap index URL to GSC: Settings → Sitemaps → Add sitemap
GSC’s Sitemaps report will show how many URLs were submitted vs. how many were indexed. A large gap between submitted and indexed indicates either thin/duplicate content (content problem) or crawl budget exhaustion (technical problem).
Pinging Search Engines After New Content
Shopify does not automatically ping search engines when new content is published. Google and Bing discover new URLs through their regular sitemap polling schedule — which for a mid-size Shopify store is typically every 24–72 hours.
For time-sensitive content (a product launch, a sale page, a trend-driven blog post), waiting 72 hours is too slow. Use manual pings:
Google: Submit the URL directly in GSC → URL Inspection → Request Indexing. Limit: roughly 10 manual submissions per day.
Bing: Submit via Bing Webmaster Tools → URL Submission. Bing’s IndexNow protocol is faster than Google’s manual submission — you can ping up to 10,000 URLs per day via API.
IndexNow: A protocol supported by Bing, Yandex, and Seznam (not yet Google) that lets you push new URLs proactively. Several Shopify apps support IndexNow (Yoast, Sitemap Ping). With IndexNow, new product pages are typically crawled within 1–2 hours of ping.
Recommended pinging cadence
| Content type | Action | Timing |
|---|---|---|
| New product launch | Manual GSC submission + IndexNow | Immediately on publish |
| New collection page | Manual GSC submission | Within 1 hour of publish |
| New blog post | IndexNow ping (automated) | On publish |
| Price update (Sale start) | No ping needed | Crawl discovers via sitemap |
| Bulk product import (50+) | Submit sitemap index to GSC | Within 24 hours |
Sample Sitemap Structure for Shopify Verticals
What a well-structured sitemap index looks like for a 300-product jewelry store:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<!-- Shopify auto-generated -->
<sitemap>
<loc>https://yourdomain.com/sitemap_pages_1.xml</loc>
<lastmod>2026-05-20</lastmod>
</sitemap>
<sitemap>
<loc>https://yourdomain.com/sitemap_products_1.xml</loc>
<lastmod>2026-05-20</lastmod>
</sitemap>
<sitemap>
<loc>https://yourdomain.com/sitemap_collections_1.xml</loc>
<lastmod>2026-05-20</lastmod>
</sitemap>
<sitemap>
<loc>https://yourdomain.com/sitemap_blogs_1.xml</loc>
<lastmod>2026-05-20</lastmod>
</sitemap>
<!-- Custom additions -->
<sitemap>
<loc>https://yourdomain.com/sitemap_images.xml</loc>
<lastmod>2026-05-20</lastmod>
</sitemap>
</sitemapindex>
The sitemap_images.xml is a custom file you maintain (or generate via app) to cover the multi-image gap.
For a supplements store vs. a jewelry store vs. an apparel store, the relative sitemap weight differs:
| Vertical | Priority sub-sitemap | Why |
|---|---|---|
| Jewelry | Products + Images | High product count, image search is primary discovery channel |
| Supplements | Products + Blog | Ingredient/benefit blog content drives discovery |
| Apparel | Collections + Products | Category-level queries dominate; filter URL management is critical |
Connecting Your Sitemap to llms.txt
Your sitemap.xml is for traditional search crawlers. Your llms.txt is its AI-era equivalent — a human-readable (and LLM-readable) text file at yourdomain.com/llms.txt that tells AI crawlers which pages are worth indexing for citation purposes.
The relationship between the two is complementary, not redundant:
sitemap.xmltells Google which pages exist and when they were last modifiedllms.txttells ChatGPT, Claude, Perplexity, and other AI crawlers which pages matter most and what they are about
A minimal llms.txt for a 300-product jewelry store:
# Aurum Jewelry — Minimalist fine jewelry for everyday wear
## Collections
- /collections/sterling-silver-rings: Full range of 925 sterling silver rings, from minimalist bands to statement pieces. 80+ products.
- /collections/gold-rings: 14K and 18K solid gold rings. Not gold-filled or gold-plated.
- /collections/engagement-rings: Moissanite and lab diamond engagement rings, $300–$2,400.
## Top Products
- /products/classic-silver-band: Best-selling 2mm sterling silver stacking ring. Tarnish-resistant. $38.
- /products/moissanite-solitaire: 1ct equivalent moissanite in 14K gold. GIA-equivalent grading. $480.
## Editorial
- /blogs/news/moissanite-vs-lab-diamond: 2,400-word comparison with pricing, hardness, and resale value data.
- /pages/about: Founding story, sourcing standards, and certifications.
Inxy generates and maintains llms.txt from your sitemap, traffic data, and top-ranking pages — keeping the most commercially important pages prioritized as your catalog changes.
The Sitemap Health Checklist
| Check | How to verify | Pass/Fail |
|---|---|---|
| All sub-sitemaps accessible | Open each URL from sitemap index | 200 status, valid XML |
| Product count matches admin | Count sitemap URLs vs admin count | Within 5% |
| No filter URLs in sitemap | Search sitemap for ?filter | Zero results |
| Sitemap submitted to GSC | GSC → Sitemaps → Status | ”Success” |
| lastmod dates are recent for active products | Sample 10 products | Updated within 30 days |
| Images sitemap exists (if image search matters) | Check sitemap index for image sitemap | Present |
| llms.txt present and current | Open yourdomain.com/llms.txt | 200 status, content current |
Next: Bridging Shopify SEO to AI SEO — where traditional Shopify SEO ends and AI SEO begins: the 5 concrete moves to make your store citable by ChatGPT, Perplexity, and AI Overviews.