the page at docs/search_results.html should be more hidden (exclude it from search results; add to a robots.txt) #1027

ebbeck · Mar 11, 2016

https://schema.org/docs/search_results.html

this leads to a blank page. Is this a valid schema tag?

Dataliberate · Mar 11, 2016

Have you tried searching for something when you arrive at that page?

~Richard

On 11 Mar 2016, at 17:49, Beck Cronin-Dixon [email protected] wrote:

https://schema.org/docs/search_results.html

this leads to a blank page. Is this a valid schema tag?

—
Reply to this email directly or view it on GitHub.

danbri · Mar 16, 2016

Hmm, thanks @ebbeck - you found a bug in the structure of our site, I think.

It looks like the file at docs/search_results.html is not meant for people to find. Instead it is a template used in the search box at the top of all pages. I'll update the title of this issue to track the underlying problem.

danbri · Aug 19, 2016

http://webschemas.org/robots.txt

Will go out with next release to schema.org.

Aaranged · Mar 2, 2017

@danbri The content of http://webschemas.org/robots.txt as currently coded instructs the search engines not to index any content on webschemas.org.

The correct markup to exclude only the search results page is:
User-agent: *
Disallow: /docs/search_results.html

danbri · Mar 2, 2017

Thanks @Aaranged - eagle eyed as ever. In this case @RichardWallis and I decided it was best not to confuse things by having the webschemas draft site show up. Depending on whether the site is running in "official" mode or webschemas-etc mode, we serve a different robots.txt - https://github.com/schemaorg/schemaorg/blob/sdo-callisto/docs/robots-blockall.txt vs https://github.com/schemaorg/schemaorg/blob/sdo-callisto/docs/robots.txt

The goryAppEngine details are in the corresponding *.yaml files. Amongst other things, the official site version should serve a simple sitemap...

AymenLoukil · Mar 3, 2017

Hello all,

the recommended method to block indexing a page is meta tags.

We should add : <meta name="robots" content="noindex"> in the header of https://github.com/schemaorg/schemaorg/blob/sdo-callisto/docs/search_results.html

Diff between the two methods :
Robots.txt : Please don't crawl this page / folder but you can continue de show it in your index
Robots meta tags : You can visit this page /folder but you are not authorized to continue indexing it

danbri changed the title from search results page is empty on schema.org to the page at docs/search_results.html should be more hidden (exclude it from search results; add to a robots.txt) Mar 16, 2016

danbri added type:bug site tools + python code labels Mar 16, 2016

danbri pushed a commit that closed this issue Aug 19, 2016

Dan Brickley Added a basic robots.txt to exclude search_results.html template.
Fixes #1027
0232e7a

danbri closed this in 0232e7a Aug 19, 2016

danbri pushed a commit that referenced this issue Aug 19, 2016

Dan Brickley Noted robots.txt creation.
See #1027
281b557

schemaorg/schemaorg

the page at docs/search_results.html should be more hidden (exclude it from search results; add to a robots.txt) #1027

Assignees

Labels

Projects

Milestone

5 participants