Author Topic: robots.txt for the AYP website  (Read 1522 times)

yogamonster

  • Posts: 20
robots.txt for the AYP website
« on: March 22, 2014, 03:40:09 PM »
Hi I noticed that when using Google to search for certain topics the AYP website returns pages titled "AYP Support Forums - Advanced Yoga Practices" all of which have the URLs in the following format:

www.aypsite.org/forum/pop_printer_friendly.asp?TOPIC_ID=(number)

I think it would be better to configure the robots.txt file so that instead of the above the following is displayed:

www.aypsite.org/index.php?topic=(number)

For example take a look here:

https://www.google.com/search?q=sixth+sense+site:aypsite.org

Cheers!
« Last Edit: April 24, 2015, 10:23:46 AM by yogani99 »

yogani

  • Posts: 6025
    • AYP Plus
robots.txt for the AYP website
« Reply #1 on: March 23, 2014, 12:55:00 AM »
Hi YM:

So what you are saying is that certain Google searches are going to the print pages, rather than to the actual forum topics? (Fortunately, the print pages include links to the actual topics.)

aypsite.org and the forums are site mapped with code set up by our tech consultant years ago. As far as I know, the site mapping is working, because the actual forum topics receive a lot of traffic from Google.

I will have our tech consultant look into the anomaly you mentioned to see what is going on. It could be that certain public pages, like the print pages, will come up with particular search term configurations like you tried. But not all searches will be like that, and hopefully most are going to the actual forum topics. I don't know that we could account for every variation in search terms that might lead to public pages in the forum that are not actual topics. There are thousands of search term variations and thousands of forum pages. But we will take a look.

Thanks for pointing this out.

The guru is in you.

PS: Whether a robots.txt file is a solution or not is for our tech consultant to decide. I only know how to use it for blocking search engines, and we are certainly not going to do that. [:I]


yogani

  • Posts: 6025
    • AYP Plus
robots.txt for the AYP website
« Reply #2 on: March 23, 2014, 05:59:08 AM »
Hi YM:

The feedback from our tech consultant is that it is not possible to predict what Google ranking algorithms will produce, and second guessing them could be counterproductive. Who knows why Google may sometimes rank a print page above the actual page? (at least we have the real link on the print page) We don't want to block those print pages from Google, because it could adversely affect the visibility of related pages.

You did not mention how widespread this issue is. If it involves hundreds or thousands of pages, we can take a closer look. But if it is affecting only a few pages, then it is probably best to leave it alone.

Thanks!

The guru is in you.


yogamonster

  • Posts: 20
robots.txt for the AYP website
« Reply #3 on: March 31, 2014, 02:17:32 PM »
Fair enough I guess. Currently there's no robots.txt on the AYP site as far as I can tell:

http://www.aypsite.com/plus/robots.txt

Preventing print pages from being indexed is easy enough - just create a text file at the above location and add something like this:

User-agent: *
Disallow: /pop_printer_friendly.asp

I don't think doing this will have negative consequences in terms of ranking because the main pages beginning with /topic.asp have the same content. Seems that searching for minor topics on Google usually produces this kind of "anomaly":

https://www.google.com/search?q=site:aypsite.org+weor

Anyway it's obviously up to you and the tech consultant to decide what to do with it - just wanted to report the issue because I was experiencing it every other time I did site search on Google.

yogani

  • Posts: 6025
    • AYP Plus
robots.txt for the AYP website
« Reply #4 on: April 01, 2014, 01:23:54 AM »
Hi YM:

The question is, if blocked, where will those "searches for minor topics" go? If not to AYP, then it would be a net loss of traffic.

Correct, there is no robots.txt file on aypsite.org . We do have one blocking the entire aypsite.com site from the search engines, because mirror sites are not permitted by Google, and both would be blocked by them if one was not blocked by us. Years ago, aypsite.org was chosen as the search site, because it includes both the lessons and the forums, whereas aypsite.com is the lessons only. As mentioned before, aypsite.org (including the forum) is site mapped for improved search engine exposure.    

The thing with doing search engine access modifications is that no one knows what the Google algorithms are, and there is risk in it. We could end up shooting ourselves in the foot for the sake of a few errant search results, which have links to the corresponding topics anyway. Very often, "upgrades" result in unexpected downsides due to unknown factors in the technology.

So from the perspective of a long time, well-established search engine exposure program, this is a case of, "If it ain't broke, don't fix it." At least not for now... [8D]

Thanks for poking around. We are always looking for ways to improve the AYP resources and their visibility. But at this stage we are not inclined to do modifications that offer a small benefit coupled with a potentially big downside risk.

The guru is in you.


Yogaman

  • Posts: 290
robots.txt for the AYP website
« Reply #5 on: April 01, 2014, 01:46:18 AM »
Most likely what you want to do is to redirect these links to the main pages, not block them. This can be done with what are called "regular expressions" to grab all pages that match a pattern in the URL structure. Redirecting pages is common, as people move content and rename pages (and entire websites). I've done this plenty of times with my business site which relies heavily on ranking high in Google results, and have never seen a negative impact.

Your comment regarding duplicate content is correct, but keep in mind that having both a webpage and the print page show up in search results might be seen by Google as duplicate content (which it is).

That said, I also agree with "if it ain't broke don't fix it". But as Wayne Gretsky said, you miss 100% of the shots you don't take :)

The question is whether Google penalizes for this, and it is probably one area they do reveal their stance on. The other question is if it is a pain for new visitors, does it turn them off from exploring AYP's website?

Yogaman

  • Posts: 290
robots.txt for the AYP website
« Reply #6 on: April 01, 2014, 01:51:02 AM »
Here's some great info on the topic, and some associated solutions: http://moz.com/learn/seo/duplicate-content

https://yoast.com/articles/duplicate-content/

It seems the rel=canonical solution is the ideal one here.
« Last Edit: April 01, 2014, 01:56:37 AM by Yogaman »

yogani

  • Posts: 6025
    • AYP Plus
robots.txt for the AYP website
« Reply #7 on: April 02, 2014, 02:43:06 AM »
Hi YM and YM:

We will look into this some more. We don't want multiple results for forum pages turning up often in Google, as it could reduce exposure.

Thanks!

TGIIY