First click free for web search and concrete5

First click free for web search and concrete5

Membership websites with private content have a problem when it comes to search engine ranking. One solution is 'first click free'


Article by Ollie / / Comments / Difficulty 
First click free for web search and concrete5

If you run a membership website which hosts content not available to everyone, you have a problem. Search engines rank websites based on amount and quality of content. If they can't access your content they can't assess it - so it's contributing nothing to your ranking - noooooooo!

One approach to deal with this has become known as "First Click Free for Web Search". In a nutshell, you make all your web pages publicly viewable to search engines. Not only does this mean search engine spiders can access and rank your content, but necessarilly that visitors coming to your site from a page of search results also get to see that page of your website. I say 'necessarilly' because search engines get very upset if you show them one version of a page on your site and then serve another different version of the same page to your visitors so, by implication, you can't have your page in the search index, all ranked up, and still protect that page with a login prompt when your human visitors click your link in the search results.

Hence the naming, 'First Click Free'.

Before I get into how you might do this in concrete5, let me deal with a couple of issues now - before you jump on me in the comments.

Firstly, as much as that Google blog post might make it sound like it, 'First Click Free' isn't a technology, it's not anything Google are doing for you, it's simply an approach - one that you need to implement for yourself - there's no magic snippet of JavaScript you can paste into your HTML.

Secondly, whether this approach is useful to you really depends on your membership model, since it's very easy to circumvent by spoofing request headers so as to appear to be a search engine. If your content is valuable, people might take the time to spoof their request headers so that your web server thinks they're a search engine and shows them your page.

Though we use this approach on c5Hub, we're only asking people to register on the site and that's free! If someone would rather take the time to spoof request headers to get at our stuff, rather than take a few seconds to register, then I say "more power to you", closely followed by, "shouldn't you get some professional help"?

Final thing. Referrer. It's a bit hit and miss. Although it can be spoofed, it should, in principle, be the page the browser was on immediately before it hit your page.

However...

Referrer information is tough to obtain these days, or should be. Most search engines are moving to HTTPS (SSL), Strictly speaking, unless your website is also running over SSL, you shouldn't receive this information in the request headers, but as of today, in my tests, only BING did not pass the referrer to a non SSL website, both Google and Yahoo (UK) still do - but may not do so forever, so what you have is a compelling argument for moving your membership site to HTTPS, since you will receive referrer information.

So, with all of that 'dealt with' lets look at how we might implement this 'approach' in concrete5. We want to intercept each request, check if it qualifies to be shown the page on the basis of the aforementioned criteria, and serve the page if it does. Sounds sensible?

So first, intercept the page request. In concrete5 we can do this by piggy-backing concrete5's on_start event. If we want to do this in a base concrete5 installation we'll need to create a new file called site_events.php and place it in the /config directory and then add code similar to the following to it:

Events::extend('on_start', 
               'firstClickFree',
               'onStart',
               '/libraries/fcfree.php'
              );

In the example we're extending the on_start event, the class name is 'firstClickFree' and the class method we want to execute is 'onStart'. The final parameter specifies the location of the file containing this class - here the file is stored in our /libraries directory. 

Note, if we were doing this in a package we could add the same code to the package controllers on_start method, no need for the site_events.php file, modifying the path to the file appropriately, if it is also held in the package.

So let's have a look at the onStart() method of our firstClickFree class:

class firstClickFree {
    public function onStart() {
        if(self::isAllowed()) {
            $c = page::getCurrentPage();
            $pkHandles = array('view_page');
            $viewPerms = self::getViewPerms();
            $guestView = in_array('1', $viewPerms);
            $regView = in_array('2', $viewPerms);
            if($regView) {
                if(!$guestView){
                    $c->assignPermissions(Group::getByID(GUEST_GROUP_ID), $pkHandles);
                    $nh = Loader::helper('navigation');
                    header('location: '.$nh->getCollectionURL($c).'?fcf=1');
                    die();
                }
            }
        }
    }
//...

Our method calls an isAllowed() method (which is covered below),  if a true response is returned, the current page object is grabbed, and permissions for the page object are set so that 'guests', those users who are not logged in, can view the page.

Our isAllowed() method might look something like this. Remember we need to perform two tests. One to establish if our visitor is a search engine spider, and a second to test if our visitor has come from a search engine.

//...
    private function isAllowed() {
        ## User Agent (spider) test
        $userAgent = strtolower($_SERVER['HTTP_USER_AGENT']);
        if(strstr($userAgent, "googlebot") || strstr($userAgent, 'yahoo') || strstr($userAgent, 'bingbot')) {
            return true;
        } 
        ## Referrer test
        $refer = parse_url($_SERVER['HTTP_REFERER']);
        $host = strtolower($refer['host']);
        if(strstr($host,'google') || strstr($host,'yahoo')) {
            return true;
        } 
        return false;
    }
//...

Next we need a method to get us the view permissions of the page. If a page is registered user viewable, we need to add guest view permissions. We will also use this method to determine which pages have both guest and registered user view access when removing the guest permissions, and obviously to ignore guest only viewable pages, which are public, so we don't want to remove guest permissions.

//...
    private function getViewPerms() {
        $viewPerms = array();
        $c = page::getCurrentPage();
        $pk = PermissionKey::getByHandle('view_page');
        $pk->setPermissionObject($c);
        $assignments = $pk->getAccessListItems();
        foreach($assignments as $asi) {
            $ae = $asi->getAccessEntityObject();
            if ($ae->getAccessEntityTypeHandle() == 'group') {
                 $group = $ae->getGroupObject();
                 if (is_object($group)) {
                      $viewPerms[] = $group->getGroupID();
                 }
            }
        }
        return $viewPerms;
    }
//... 

So far we have enough to detect if the referrer or user agent is a search engine, to check if the page is a prviate page, and to add guest permissions if it is. The problem now is our page now public and viewable by the whole world. Not really want we want for our membership model! We need to revert the permissions back to private.

To do this we implement some more functionality in this class and trigger it on the back of another of concrete5's built in events, namely the on_render_complete event which, as you may have guessed, fires after the page has been displayed. 

All we're doing here is removing the guest view permissions again:

//...
    public function onRenderComplete() {
        $c = page::getCurrentPage();
        $nh = Loader::helper('navigation');
        if($_REQUEST['fcf']) {
            $pkHandles = array('view_page');
            $viewPerms = self::getViewPerms();
            $guestView = in_array('1', $viewPerms);
            $regView = in_array('2', $viewPerms);
            if($regView && $guestView) {
                $c->assignPermissions(Group::getByID(GUEST_GROUP_ID), $pkHandles, 0);
                $c->refreshCache();
            }
        }
    }

And the event listener:

Events::extend('on_render_complete', 
               'firstClickFree',
               'onRenderComplete',
               '/libraries/fcfree.php'
              );

To summarise, we've made the page public, rendered the page, and then made the page private again when we've finished displaying it. Snazzy.

Our membership model is intact, but Google et al know all about the excellent content on our private page and anyone hitting the page from a link in search engine results gets to read the one page, but subsequent clicks around our site will require login/registration.

You will probably need to enhance this example to test for pages of a certain page type, before making public and particularly before making private again. Without any such test this code would fire on all pages and make private any pages that you actually intend to be guest viewable such as your site's privacy policy, terms of use or even your homepage!

Hope you found this interesting. Thanks for reading.

Join the conversation

comments powered by Disqus