the page at docs/search_results.html should be more hidden (exclude it from search results; add to a robots.txt) #1027

Closed
ebbeck opened this Issue Mar 11, 2016 · 6 comments

Projects

None yet

5 participants

@ebbeck
ebbeck commented Mar 11, 2016

https://schema.org/docs/search_results.html

this leads to a blank page. Is this a valid schema tag?

@Dataliberate
Contributor

Have you tried searching for something when you arrive at that page?

~Richard

On 11 Mar 2016, at 17:49, Beck Cronin-Dixon [email protected] wrote:

https://schema.org/docs/search_results.html

this leads to a blank page. Is this a valid schema tag?


Reply to this email directly or view it on GitHub.

@danbri
Contributor
danbri commented Mar 16, 2016

Hmm, thanks @ebbeck - you found a bug in the structure of our site, I think.

It looks like the file at docs/search_results.html is not meant for people to find. Instead it is a template used in the search box at the top of all pages. I'll update the title of this issue to track the underlying problem.

@danbri danbri changed the title from search results page is empty on schema.org to the page at docs/search_results.html should be more hidden (exclude it from search results; add to a robots.txt) Mar 16, 2016
@danbri danbri pushed a commit that closed this issue Aug 19, 2016
Dan Brickley Added a basic robots.txt to exclude search_results.html template.
Fixes #1027
0232e7a
@danbri danbri closed this in 0232e7a Aug 19, 2016
@danbri
Contributor
danbri commented Aug 19, 2016

http://webschemas.org/robots.txt

Will go out with next release to schema.org.

@danbri danbri pushed a commit that referenced this issue Aug 19, 2016
Dan Brickley Noted robots.txt creation.
See #1027
281b557
@Aaranged
Aaranged commented Mar 2, 2017

@danbri The content of http://webschemas.org/robots.txt as currently coded instructs the search engines not to index any content on webschemas.org.

The correct markup to exclude only the search results page is:
User-agent: *
Disallow: /docs/search_results.html

@danbri
Contributor
danbri commented Mar 2, 2017

Thanks @Aaranged - eagle eyed as ever. In this case @RichardWallis and I decided it was best not to confuse things by having the webschemas draft site show up. Depending on whether the site is running in "official" mode or webschemas-etc mode, we serve a different robots.txt - https://github.com/schemaorg/schemaorg/blob/sdo-callisto/docs/robots-blockall.txt vs https://github.com/schemaorg/schemaorg/blob/sdo-callisto/docs/robots.txt

The goryAppEngine details are in the corresponding *.yaml files. Amongst other things, the official site version should serve a simple sitemap...

@AymenLoukil
AymenLoukil commented Mar 3, 2017 edited

Hello all,

the recommended method to block indexing a page is meta tags.

We should add : <meta name="robots" content="noindex"> in the header of https://github.com/schemaorg/schemaorg/blob/sdo-callisto/docs/search_results.html

Diff between the two methods :
Robots.txt : Please don't crawl this page / folder but you can continue de show it in your index
Robots meta tags : You can visit this page /folder but you are not authorized to continue indexing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment