Pattern matching on full URL

Solved
gviner99
Tourist
7 0 2

Hi there

I am very new to Shopify and have no idea how to do something. I'd like to add some meta tags to collections that have been filtered and have the _ character in the URL. For example, i want the code to fire on collections/shoes/colour_red but not on collections/shoes.

I tried 

{% if collection.handle contains "_" %}
    ...
{% endif %}

but that didn't work. I've now discovered that collection.handle only returns /shoes in the above example, and collection.url returns /collections/shoes. 

There must surely be a way to return the full URL collections/shoes/colour_red, right?

Could someone please help me?

Thanks very much

Garry

0 Likes

Inspect the tags for the products instead? What is it that you're looking to achieve..?

 

0 Likes
gviner99
Tourist
7 0 2

Thanks for the response.

What i am trying to achieve is improve the SEO on my client's store. They have 1000 products in about 150 categories, but over many hundreds of thousands of URLs given the crazy filtering system they have implemented, and crawling is a nightmare.

I can see that by default Shopify's robots.txt file excludes anything containing a + sign from crawling (which happens when you apply multiple filters). But it doesn't exclude the _ sign (which happens when you apply a single filter).

As far as i know, tags won't solve this, as the issue is not with a specific group of products or categories. It's with the way the same category can be accessed with hundreds of URLs. Which is why if i can access the full URL of a page in real time, i imagine i can fix this.

 

0 Likes

I'm still not 100% on what you're after, however - bear in mind that even though each product can be arrived at in a myriad of ways it always has a canonical URL output in the html for the product page (<link rel="canonical" href="...">) - does this allow you to then ignore the url and pick up the product instead?

0 Likes

Also - the page will have the current_tags applied: https://shopify.dev/docs/themes/liquid/reference/objects/current-tags

 

0 Likes
gviner99
Tourist
7 0 2

I understand about the canonical tag, but that solves a problem with indexation, not with crawling.

In short, Google has a "crawl budget" and allocates a certain number of resources to crawling a website. As an SEO you want this to be as efficient a process as possible. If Google decides this is not efficient, it will ignore pages you want to be crawled and indexed. In this case there are key pages that are not being indexed, as crawling the site is so inefficient due to the multiple variations of collection pages resulting from filtering. Thus i need to add some more logic to the robots.txt file to manage this. And the only way to do this is with a Block Robots command in the liquid theme, i believe.

So back to my issue. When someone browses to /collections/shoes i want the page crawled and indexed. But when new URLs are created such as /collections/shoes/colour_red or /collections/shoes/size_5, it is important for me NOT to have that page crawled and it prevents important pages being crawled. Does that make sense?

And therefore i need a way to apply pattern matching to capture the latter pages and apply a rule, while not applying it to the underlying collections pages.

Thanks

 

tim
Shopify Expert
2926 143 1021

This is an accepted solution.

Ultimately, you do not want you collection page urls indexed if they are filtered with tags, right?

Basically, something like this, put inside <head> in your theme.liquid Layout should help:

{% if template.name == "collection" and current_tags  %}
  <meta name="robots" content="noindex,nofollow" />
{% endif %}

Not sure though that this will preserve the budget as the page is already retrieved and parsed?

Want to hire me to tweak a theme? Mail me at tairli@yahoo.com!
My post solved your problem? Like it!
0 Likes
gviner99
Tourist
7 0 2

Tim, you're a legend. That was what i was looking for in terms of pattern matching and allows me to stop those pages being indexed. 

But it won't stopping them from being crawled though as the robots.txt file can't be edited. Somebody somewhere had said that the key to this was using this command, but i can't find anything else anywhere on this mythical "Block Robots" command. Any ideas?

2021-03-22_171111.jpg

0 Likes
tim
Shopify Expert
2926 143 1021

The code I gave you (meta robots) is probably the closest to the mythical Block Robots command. 

Also -- sitemap, obviously, does not include these links, so, I guess, they are learned from crawling the collection  pages or other links in your site html. Possibly, adding rel="nofollow" to this links may help.

Not sure, not SEO guru.

 

Want to hire me to tweak a theme? Mail me at tairli@yahoo.com!
My post solved your problem? Like it!
KieranR
Shopify Partner
296 24 88

If you want to prevent them being crawled, then you could nofollow internal links pointing to those pages, from other pages. But unless it's creating huge crawl budget issues, the noindex should be fine?

Full time Shopify SEO guy, based in NZ. Sometimes freelance outside the 9-5.
0 Likes