Using robots.txt

As Open Orchestra is a multisite platform, you will have to manage several robots.txt files for a single installation. To simplify the maintenance of their content, they can be directly edited in the Back Office. But contribution is only the first step. To be available, these information require to be dumped in files and to be correctly served to the client.

Creation of robots.txt files with Open Orchestra is achieved in 3 steps:

Files contribution

Contributing the content of a robots.txt file is straight forward. As it is specific to each website, you can do it in the website edit form, located in the Administration submenu. Fill the matching textarea: the content of the robots.txt file will be exactly what is written here.

../_images/contributing_robots.png

Files generation

Once contributed, the content of the robots.txt is only known from the database. To serve it to a client requesting it, two approaches are possible: use a controller action looking in the database for the information matching the current site, or serve a previously generated file containing these information.

To enhance perfomances, Open Orchestra choose the second option. This solution prevents any PHP action and database access. As these files are static they can easily be cached by a reverse proxy.

Once a robots.txt data is contributed, you have to manually dump it into a file. Open Orchestra provides a console command, available on your Front installation:

app/console orchestra:robots:generate [–siteId=SITEID]

This command dumps one or all the robots.txt files depending on the siteId parameter presence. To be accessible by web client, the generated files are stored in the web directory of your application. Each file is stored in a sub-directory named by the site Id. For instance if you have three sites whose id are ‘site-1’, ‘site-2’ and ‘site-3’, the robots.txt files will be dumped here:

  • web/site-1/robots.txt
  • web/site-2/robots.txt
  • web/site-3/robots.txt

A good practice is to use that command in a cron, to refresh periodically the content of the files.

Files routing

When a client requests a robots.txt, it should be located at the root of the domain. But the files are generated somewhere else, in a subfolder named by the site id. As a result, the webserver must be configured to redirect to the matching file: this can be done by a rewrite rule . The type of configuration to tweak depends on you server type, for instance on Apache this is done via the Virtual Host mechanism.

Here is a configuration example for Apache, in the case of a site with id ‘my-site’:

Once Apache is reloaded, the rewrite rule is used and the robots.txt file is accessible.

Every further modification will modify the file content but not its name, so you don’t have to modify the rewrite rule. Since then you can update the file periodically without having to restart the webserver.