How can I remove URLs from the sitemap and NOINDEX,FOLLOW the removed page via the API?

jahodson3
Tourist
9 0 9

We were told by the developer of one of the most popular sitemap and noindex manager apps on Shopify that Shopify forces a NOINDEX, NOFOLLOW when you remove a URL from the sitemap via the API which is crazy... And I am trying to determine is this is indeed the case.

Yes, if you NOINDEX a page with a <meta name="robots" /> element, you should always remove it from the sitemap.xml as it will trigger errors with the search engines because it's ambiguous (the meta robots says don't index it while the URL's existence in the sitemap.xml says do index it).  However, the converse is not necessarily true.  Just because you remove a file from the sitemap.xml does NOT mean you always want it flagged NOINDEX.

 

But what is FAR worse than forcing a NOINDEX when you remove a URL from the sitemap.xml is to force a NOFOLLOW in the <meta name="robots" /> element.  It's VERY detrimental to SEO and the ability of ALL URLs on the site to rank.  It creates PageRank/link juice black holes in your site where PageRank/link juice flows into a page but cannot flow back out.  This drain your site of link juice and drastically reduces all of your site's URLs' ability to rank.  

 

Is there documentation for the Shopify API that shows ALL metafield keys, ALL values valid for each key, and what the values do as they relate to sitemap, site product search, and meta robots?  I haven't seen much in the way of complete documentation, only a couple of examples here https://help.shopify.com/en/api/guides/updating-seo-data.

Thanks.

 

 

Replies 17 (17)

_JB
Shopify Staff
836 100 222

Hey @jahodson3,

 

With regards to the seo "hidden" metafield mentioned in this guide, you're correct in that adding this metafield will add both NOINDEX and NOFOLLOW tag to the resource. This is currently the only metafield that allows configuration of SEO index behaviour.

 

Note that this method of adding the SEO tags through a metafield has been created to allow a quick and easy way to remove a particular product/resource from search engine index and storefront search results. In cases where something has been published/indexed that a merchant wants removed, this metafield can be added to remove the resource from search engine and storefront search results. This should only be used on individual resources, and not on theme templates. If you're looking to add just the NOINDEX tag to particular a resource, this can be done on theme templates. In case you haven't already seen it, this guide shows how to add a liquid tag that will render a NOINDEX meta tag on the storefront for a particular resource. 

JB | Solutions Engineer @ Shopify 
 - Was your question answered? Mark it as an Accepted Solution
 - To learn more visit Shopify.dev or the Shopify Web Design and Development Blog

jahodson3
Tourist
9 0 9

Thanks for the reply JB.  This is not something that would seem doable at the theme level. I need to selective choose particular collections, products, and pages that I don't want in the sitemap and/or site search and/or with noindex and/or noarchive.

Is there a way, using the API, that I could write an app that would provide functionality for users to independently control at the collection, page, and product level the ability to:

 

1) remove that URL from the sitemap

2) remove that URL from the on-site search

3) add  NOINDEX

4) add NOFOLLOW (really don't even want this, should never be used but would do for completeness)

5) add NOARCHIVE

 

I would want the app to have the 5 checkboxes above and control each option independently of each other so that users could select them in any combination.  For example, there are times that you might want to remove a URL from the sitemap, but leave it in site search and leave it indexable. And while I might include a NOFOLLOW option in the app, I would put a warning when selected that this could be detrimental to the site's visibility in organic search (SEO).  

 

I have always heard that Shopify has challenges when it comes to SEO, and now that I've worked with a couple of clients on Shopify I know this is the biggest SEO flaw I've seen with the platform other than page load times.  No knowledgeable SEO  would ever suggest adding NOFOLLOW to a meta robots element because it literally siphons off link juice and drains the site of link juice.

 

Is there a way to JUST remove a URL from the sitemap without triggering a meta robots with NOINDEX, NOFOLLOW?

_JB
Shopify Staff
836 100 222

Hey @jahodson3,

 

I've sent up your feedback regarding the addition of NOFOLLOW in the current metafield. It's very possible there are things I'm not aware of that were considered when the decision was made to add NOFOLLOW with the metafield, but I definitely see where you're coming from in this regard. I can't make any promises as to if/when this will change, but your feedback is in the right hands now. 

 

For your questions, #1 and #2 can be achieved with the metafield we discussed. Note that this metafield is currently the only way to remove a URL from the sitemap. #3, 4, and 5 can be achieved by adding code to the theme template. One idea could be to use a code snippet on the collection, page, or product template, containing conditional logic for rendering the HTML for NOINDEX, NOFOLLOW, or NOARCHIVE based on the resource's tags. Your app can inject this snippet on the template, and then add/remove tags based on the user's selection. This is just one idea, there may be other more elegant ways to do this as well. You may also want to explore using metafields instead of tags for the logic.

JB | Solutions Engineer @ Shopify 
 - Was your question answered? Mark it as an Accepted Solution
 - To learn more visit Shopify.dev or the Shopify Web Design and Development Blog

jahodson3
Tourist
9 0 9

Thanks so much for the reply JB.  I appreciate you passing this on.  It would be ideal if they could decouple everything.   For example, currently:    

 

POST /admin/api/2020-01/products/#{id}/metafields.json
{
  "metafield": {
    "namespace": "seo",
    "key": "hidden",
    "value": 1,
    "value_type": "integer"
  }
}

  

removes the URL from the sitemap, removes from site search, and adds noindex, nofollow.  You could continue to support this existing treatment when value=1 while decoupling by adding new values for the existing metafield namespace=seo, key=hidden such as:


value=1: Continue doing what you do today

value=2: Remove from sitemap and sitesearch only (and do NOT add a meta robots at all)

value=3: Remove from sitemap only

value=4: Remove from site search only

 

If we had the above then users can handle generating the meta robots the way they want it.  It's the fact that removing a URL from the sitemap with the existing method has three other side effects (remove from search, add noindex, add nofollow) which is problematic.

Again thanks so much for responding and looking into a possible fix/enhancement.

David_OL
Shopify Partner
58 1 4

Any update on this?

banned
Maaike
Visitor
1 0 2

Hi,

Is there an update on this issue? I am really waiting for the possibility to:

1) remove an URL from the sitemap

2) remove that same URL from the on-site search

3) add  NOINDEX

4) add FOLLOW


Now it is only possible to remove an URL from the sitemap when you add NOFOLLOW. You don't want to add NOFOLLOW from a SEO perspective.
Is there any possibility to do this now?
Hope to hear from you.

formulaswiss
Visitor
1 0 0

Any updates ? I really need to remove some canonicalized pages from the sitemap but keep them index, follow

jahodson3
Tourist
9 0 9

Crickets... I have heard nothing since posting and them saying they were going to raise the issue to the developers.  It's funny as the current way this is implemented is very detrimental from an SEO perspective, and this should be a VERY minor fix (likely one or two dozen lines of strategically placed code in the API that would supply much needed functionality while leaving the existing calls backward compatible for those sites/apps that don't mind their link juice flowing into a black hole.  It appears they don't have anyone with a lot of SEO talent.  I saw the other day they were advertising for an SEO.

cescapesca_86
Excursionist
20 0 13

Hi, can someone tell me if I am doing this correctly? I read the tutorial but I don't know how to do this JSON post thing, I am not a developer.

My goal is to remove two pages from the Shopify automatically-generated sitemap. These pages are visible and I cannot just put them as hidden because they are used as sections of different pages. However, I want to remove the pages themselves from the sitemap.

So what I did is how I normally edit metafields. I went to https://testeliz.myshopify.com/admin/bulk?resource_name=Page&edit=metafields.seo.hidden and put a value of 1 in those two pages, please see the screenshot. For the other pages which should remain in the sitemap, I left the metafield empty. Is this correct?

How can I test if these pages are really no longer in the sitemap? If I go now to https://testeliz.myshopify.com/sitemap_pages_1.xml they seem to be gone but I'm not 100% sure if I did it correctly, so I would love some feedback.

Thanks!

Francesca

Screenshot 2021-05-30 at 12.45.01.png

DabsDesign
Shopify Partner
43 1 10

Firstly thank you Francesca for sharing this information and yes, I also used the same method and after some tests we noticed that the URL is no longer on the sitemap.

Once removed, it will no longer be listed by search engines such as Google.

In addition to the way you mentioned you can do it, you can also use the Metafields Editor app, which is very good by the way.

Interesting links:

https://www.shopify.co.uk/partners/blog/110057030-using-metafields-in-your-shopify-theme

https://www.sunbowlsystems.com/blogs/how-to/metafields-in-shopify-without-using-an-app

 

DABS Design - Sua Agência Shopify no Brasil. Somos especialistas em Shopify e Shopify Plus, oferecendo consultoria, suporte, criação, integração de Apps e desenvolvimento para sua loja virtual. Conte com uma agência Shopify Expert Partner para potencializar o seu negócio! Entre em contato conosco. Transforme sua loja com os especialistas em Shopify no Brasil!
cescapesca_86
Excursionist
20 0 13

Hi @DabsDesign  and thanks for your confirmation. I also did some tests and eevrything is working as expected.

 

It was from that Sunbowl link that I learned how to do this actually. I was always told that an app was necessary, but it turns out it's not true which is great. We always try to have the fewest number of apps possible both for site speed and because all those small monthly fees add up fast. So when it's possible to do things without an app, it's definitely better.

 

Shopify you guys really need to make a user-friendly way to access and update the metafields. It should not be necessary to use weird workarounds or an app just to update important fields about our products!

chris214
Excursionist
46 0 10

thank you @cescapesca_86 and @DabsDesign for your explanation how to hide pages from google manually. Could you doublecheck the bulk editor for resource_name=Page?

 

/admin/bulk?resource_name=Product&edit=metafields.seo.hidden works for me

/admin/bulk?resource_name=Page&edit=metafields.seo.hidden DOES NOT work, the page was not found

demib
Shopify Partner
132 13 62

@DabsDesign wrote:

Once removed, it will no longer be listed by search engines such as Google.

Not entirely true 🙂🙂

The META-robots tag is (as well as all other META-data, considered "signals" by Google - not " directives. What this mean is that they do listen to the signal, take it into (automatic) considerations but they do not promise to always respect it. 

There can be many reasons why they chose to keep indexing a page that have META-robots NOINDEX - sometimes it's because so many link to it that they judge it important to keep. There could also be other (more or less valid) reasons. But the fact - and what we have to understand is, that it is not a directive and a secure way to remove a page from Googles index. 

The XML sitemap have similar problems. This is again just a signal. Its basically a list of URL's we "suggest" Google should crawl. They often follow that suggestion but again its not a directive. We cannot be sure. Likewise, if we remove a URL from the sitemap there is absolutely no guarantee it will be removed from the index. 

Robots.txt on the other hand is a directive. Google promise to respect it (just as all other bots and agents should). So if we exclude a file or directory from crawling in the robots.txt file Google will not crawl it. 

However you need to understand that crawling and indexing is not the same. Google quite often include URL's in their index they have not crawled. Some estimates have actually suggested that up to 1/3 of Googles index is un-crawled URL's. Robots.txt do not prevent indexing - it only prevent crawling. 

In reality, off course a page that is not crawled will most of not rank for anything because the only keywords Google can associate it with is based on words in the URL's and pages linking to it. 

I too, would LOVE if shopify would extend the seo hidden field with the 3 additional variants! That should be a very easy fix and very, very useful!

SEO Geek since 1996, consultant author and public speaker. Admin of the Shopify SEO Facebook Group

Was your question answered? Kindly mark it as an Accepted Solution 🙂
mvu00
Excursionist
14 0 33

Is there an update on this please?  The suggestion by Jahodson3 would be an elegant implementation that is backwards compatible with existing merchants, but forward compatible for anyone using the alternative variables of 2,3 or 4.   Please Please please implement this. 

Copied again below: 

 

"You could continue to support this existing treatment when value=1 while decoupling by adding new values for the existing metafield namespace=seo, key=hidden such as:


value=1: Continue doing what you do today

value=2: Remove from sitemap and sitesearch only (and do NOT add a meta robots at all)

value=3: Remove from sitemap only

value=4: Remove from site search only

 

If we had the above then users can handle generating the meta robots the way they want it.  It's the fact that removing a URL from the sitemap with the existing method has three other side effects (remove from search, add noindex, add nofollow) which is problematic."

HenryAuffahrt
Shopify Partner
66 3 31

We are currently facing exactly the same problem. Thanks to this post, I now understand that this is how Shopify behaves here. This is unfortunately not optimal for SEO, I definitely agree with the opinion of @jahodson3  and many others in this thread!

 

We have found the following workaround for us: We manipulate that content_for_header object. I warn you all, be careful with changes to this object.

Have a quick read of this section in the Shopify docs: https://shopify.dev/themes/architecture/layouts#content

 

This is how we manipulate the content_for_header object:
(first example is what I want, but I get it NOT to work, anyone have suggestions?)

  {% capture h_content %}
      {{ content_for_header }}
  {% endcapture %}
  {{ h_content | remove: '<meta name="robots" content="noindex,nofollow">' }}

This Workaround is not working. I find out that the matching of the equals sign "=" is not working! Does anyone have an idea why the equals sign is not working in this case?

 

So I built this Workaround for the Workaround 😅: (It's working!)

  {% capture h_content %}
      {{ content_for_header }}
  {% endcapture %}
  {{ h_content | remove: ',nofollow' }}

When I just remove the ",nofollow" the outcome is:  <meta name="robots" content="noindex">

 

Which is fine for us, because we are using the Yoast Plugin. With this plugin we can configure each page/product/blog/... to "noindex,follow" and it will remove the page also from the sitemap. Yoast will create a meta robot tag like this: <meta name="robots" content="noindex, follow"> which is exactly what we want to accomplish.

 

Overall, this is how our <head> looks like now after the above change:

[..]
<!-- This site is optimized with Yoast SEO for Shopify -->
<meta name="robots" content="noindex, follow">
[..]
<!--/ Yoast SEO -->
[..]
<!-- START Content from manipulated content_for_header object -->
<meta name="robots" content="noindex">
[..]
<!-- END Content from manipulated content_for_header object -->
[..]

Unfortunately, we now have the Meta Robots tag in twice, but we no longer have a conflicting signal.

 

I hope this helps others and maybe someone has a solution for my first example? Thanks already to all!

SEO & Webdeveloper @ Better Sell Online
jahodson3
Tourist
9 0 9

All you need to do now is figure out how to turn off Yoast generating a meta robots tag.  Your workaround is generating the correct value ("noindex" which includes an implied "follow"). 

 

HenryAuffahrt
Shopify Partner
66 3 31

@jahodson3 wrote:

All you need to do now is figure out how to turn off Yoast generating a meta robots tag.  Your workaround is generating the correct value ("noindex" which includes an implied "follow"). 

 


That is correct, but we want to keep Yoast anyway. We are adding a lot of Rich-Snippets with it. Also, we use it to put noindex to some pages, because it is handy to do it via Yoast.

SEO & Webdeveloper @ Better Sell Online