How can I remove a spam page from my website index?

It’s definitely not theme based as over 200,000 sites are being affected. Honestly, this is a big issue and seems to be a vulnerability on Shopifys end. I have yet to find a proper solution but doing all I can. Any update would be greatly appreciated.

Apologies, when I say theme based I mean that many themes are impacted by this - and that it maybe a different fix per theme. It is interesting that Skims don’t have this issue. I do believe the key is in presenting collection pages with 0 product results with a 404 - but just not sure how to do this.

1 Like

Hi Shay,

The query string can be change to anything and shopify is currently set to print whatever the query is like for example

https://www.french-address.com/collections/vendors?q=hi-shay

So all anyone wanting to take advantage of this needs to do it is know that a site is on shopify. I believe an answer lies in 404 these queries, rather than printing the query on the page - as to Google this just makes it a “unique page, suitable for indexing”.

This needs solving ASAP, we’re up to 17k indexed pages for a site that should have 200.

1 Like

That is why you noindex. If a page is marked noindex, it doesn’t matter how many sites link to you. Google will ignore them while dropping these 0 result pages from their index.

Hi Greg

Two weeks ago i tried the no index proposed by Jizo_Inagaki :

{%- if request.path == ‘/collections/vendors’ and collection.all_products_count == 0 -%}

{%- endif -%}

This solution seemed to work at first. But i just checked my google search counsel and here is a screen shot:

These pages are still being indexed!

Greg Please let us know if you have a better no index solution?

2 Likes

Hi Shadi1, noindex is the way and is nearly always respected by Google. Reports in GSC can lag and are not at all current. It can take a few weeks or more for Google to drop a page depending on how well-trafficked it is and then more time for Google to update GSC. The best way to see the current status is to use the “Inspect URL” and that will show the current status. Also if you have a small set of pages you set to noindex, you can use the “Inspect URL” tool to request reindexing and that will speed things up. Note that if you block those pages with noindex before Google can recrawl and see that change, it will not drop the pages from the index, because it can’t see the change.

1 Like

Hi there, @DaveSweetCures

Wondering if you could elaborate on your temp fix v2. Do you mean an alteration to:

{% if request.path == ‘/collections/vendors’ and collection.all_products_count == 0 -%}

{% endif %}

I have placed the code above in our theme yesterday, but if you see better results with an alterion. Please do share.

Cheers,

Melissa

Hi @Mels1

After a bit of sleepless night I’m at this point, placing the following code in theme.liquid

{%- if canonical_url != blank -%}
{% request.path == ‘/collections/vendors’ and collection.all_products_count == 0 -%} {% else %}
{% endif %}
{%- endif -%}

The above will do two things,

Add to each ?q= page where the products for that vendor =0

and it also places

To give Google a second indication that you don’t want these pages to be ranked in Google.

Please note:

  1. This is a temp fix for us, we want to prevent the site printing the query on a page full stop. And are working on a fix to send these type of requests to a 404.
  2. That said we will probably leave this code in as a back up.
  3. You can easily remove thousands of pages from Google by using there remove tool an selecting the prefix option yourcompany.com - This website is for sale! - yourcompany Resources and Information.
  4. Please check current canonical setup before deploying if not wanting to do that additional canonical setup use the code below (that you’ve used already).

{% if request.path == ‘/collections/vendors’ and collection.all_products_count == 0 -%}

{% endif %}
  1. I will update once the team have sorted a fix that 404’s these queries.
  2. If anyone can write this better - please feel free to update - thank you.
1 Like

I would remove the canonical part of your liquid, it’s unnecessary. Pages that aren’t indexed, are not ranked and if you en masse canonicalize pages without a canonical to the homepage you may be shooting yourself in the foot unless you know all the pages on your site that it may affect, but really again, it’s not neccessary and only adds complexity that is a risk.

1 Like

Hi, I just saw these links in my google search console. FS. Is there a non-techny solution, has @Shopify_77 done anything about this? Does it affect our rankings??

@fabmol1 have you read through this thread?

Hi guys, so I found the same issue today. I think the ideal solution so far is first to ask Google to remove those links with the same prefix /collections/vendors?q= temporarily and then add the code provided above by @DaveSweetCures to prevent others pages from being created in the same way. And robots.txt could be just a backup.

Hi @z285chen

That’s two of several steps we’ve implemented. The issue with just requesting Google removes the URLs is that it’s a temporary fix, meanwhile millions more links are being built and linked to your site - this isn’t to say you shouldn’t do this, but just bear in mind the problem is still there. Likewise with the robots.txt. as a back - Google is more than happy to ignore robots.txt if an indexable page is found through a link - so just using those two solutions won’t work. We’ve setup the following which you are welcome to check on our site https://www.sweetcures.co.uk/collections/vendors?q=tryit

  1. Our web team prevented the query being printed in the body of the page using the below text. Please note this fix may vary on theme used.

Update sections/collection-content.liquid

Line one above all script:
{%- unless collection.handle == ‘vendors’ -%}
After all script above schema:
{%- else -%}

{{ 'general.404.title' | t }}

{{ 'general.404.text' | t }}

{{ 'general.404.subtext_html' | t }}

{%- endunless -%}
  1. Prevent the query being printed in the title tag

Update theme.liquid title tag to be as below

{%- if request.path == ‘/collections/vendors’ and collection.all_products_count == 0 %}

404 Not Found {%- else %} {{ seo_title | strip }} {% endif %}
  1. Print a no index on these pages

{% if request.path == ‘/collections/vendors’ and collection.all_products_count == 0 -%}

{% endif %}
  1. Add to robots.txt file - Note this is merely a back up Google uses this as nothing more than a guide and potentially will still list URLs with inbound links.

{{ ‘Disallow: /vendors?q=’ }}

  1. You can easily remove thousands of pages from Google by using there remove tool an selecting the prefix option yourcompany.com - This website is for sale! - yourcompany Resources and Information.

  2. Use Ahrefs free tool / webmaster tools to explore and download link profile - use this as a basis to upload a list to Google Disavow tool.

Were we are as of 06/12/2022 - Google Webmasters show 2,000,000 pages, 330,000 which have been indexed - bear in mind this is a 300 page site at most. Google webmasters is slow so I’m hopeful over time the above numbers will improve. Will update as we go.

I have to say that I still don’t feel this issue is fixed as even with no index and the above in place, the ability of a rouge party to create unique URLs makes this a very difficult issue to resolve when it’s not possible to publish a 404 response for these queries. Really Shopify should make the vendor setup a YES / NO tick box option - you going to use it or not?. And then simply close this door with a 404 for those who aren’t going to use it. Shopify are responsible for not millions, but hundreds of billions of pages of web spam. Passing it off a theme issue simply isn’t true. Site owners really should be lobbing Shopify as this does have potential to ruin link profiles and even take sites down.

@gregbernhardt Please could you request Shopify update this vendor setup to be an Y/N tick box option. This would instantly solve this problem and remove billions of page of spam. Thank you.

1 Like

@DaveSweetCures nice job removing the printout. That is half the battle. Once the incentive is removed, the spammers will move on. If they can’t print out the query then there is nothing for them to do. You may want to reconsider the 404 language however, as the page does exist, it’s just that no venders or searches are found. That is a different communication to users. Also your robots block is preventing Google from seeing the noindex tag. The pages will then remain indexed. Only add the block once the pages are deindexed.

1 Like

Cheers @gregbernhardt - that’s actually a very good point re the robot.txt - I’ll update the post. Thank you.

1 Like

So if we said this was potentially a solution for this?

  1. Our web team prevented the query being printed in the body of the page using the below text. Please note this fix may vary on theme used. Please change the message to what’s suitable - we’ve used a 404 message but this may not be suitable or overly relevant for all - step 1 & 2.

Update sections/collection-content.liquid

Line one above all script:
{%- unless collection.handle == ‘vendors’ -%}
After all script above schema:
{%- else -%}

{{ 'general.404.title' | t }}

{{ 'general.404.text' | t }}

{{ 'general.404.subtext_html' | t }}

{%- endunless -%}
  1. Prevent the query being printed in the title tag

Update theme.liquid title tag to be as below

{%- if request.path == ‘/collections/vendors’ and collection.all_products_count == 0 %}

404 Not Found {%- else %} {{ seo_title | strip }} {% endif %}
  1. Print a no index on these pages

{% if request.path == ‘/collections/vendors’ and collection.all_products_count == 0 -%}

{% endif %}
  1. You can easily remove thousands of pages from Google by using there remove tool an selecting the prefix option yourcompany.com - This website is for sale! - yourcompany Resources and Information.

  2. Use Ahrefs free tool / webmaster tools to explore and download link profile - use this as a basis to upload a list to Google Disavow tool.

  3. Add to robots.txt file - but only after the pages are no longer indexed by Google - Note this is merely a back up Google uses this as nothing more than a guide and potentially will still list URLs with inbound links.

{{ ‘Disallow: /vendors?q=’ }}

1 Like

Can Shopify advise on the code to use to redirect to a 404 page using the liquid. That would be a step in the right direction. This is a widespread problem for store owners and there seems to be a focus by Shopify on the backlinks and not the actual issue that we just want to prevent these pages from being generated by a search string. SKIMS has a good way of dealing with this which would hopefully assist the wider community if Shopify could communicate how to achieve the same result.

1 Like

Hello Dave, I need clarification on steps 4 and 5. Can I use the Google Disavow Tool for my URL? I thought this tool only worked to remove backlinks from other sites pointing to mine instead of links with my own URL, which is the case for this specific issue.

Thanks, @DaveSweetCures for the detailed steps. I think it’s a temporary workout for us now until Shopify realizes it is a widespread serious bug. Anyways, we could not 404 those pages as I’ve also checked other websites like Allbirds or Fashion Nova which just redirect them to their homepage or collection/vendor instead.