Building a Pinterest like Image Crawler


Posted by Alex Peta on October 23, 2012 Copyright© from Bing images : Mooney Falls in Havasu Canyon, Arizona (© Brendan van Son/Tandem Stills + Motion)

Social networking sites are looking for newer ways to involve their users more, make them share and exchange information as quickly as possible with as littlest effort as possible from the user, from any platform and any device. Some even go even beyond the “moral” boundaries as to collect information without the user being informed (cookie collection).

But leaving this aside, I’m writing today about the Pinterest share button, it has an interesting feature, you can drag&drop it to your browser’s bookmarks bar and on every site, when clicking that specific button, it scans the HTML and asks the user which of the site images he wants to “pin” on his profile.

So our goal today is to provide a proof of concept of  how this is done, so let’s get coding.

First, the user goodies : the bookmark button.

Let’s all agree now what browser bookmarks are. They are character strings saved and executed when clicked. Good. Combine this piece of information with the fact that instead of URL’s, in a anchor tag we can also run Javascript instead of the HREF attribute, then we can conclude that we can run Javascript from the toolbar.

Our bookmark button is an anchor tag withe the following HREF attribute :

javascript:void((function(){var%20e=document.createElement('script');e.setAttribute('type','text/javascript');e.setAttribute('charset','UTF-8');e.setAttribute('src',''+Math.random()*99999999);document.body.appendChild(e)})());" onclick="alert('Drag and drop this button to your browser bookmarks toolbar.'); return false;

This piece of code does the following things :

    • creates a new script tag html element
    • sets the script tag type to javascript
    • sets the charset attribute
    • gets the javascript that will be injected in target page by setting the SRC attribute (note : I have added here after the file name a random number to prevent the browser from getting the file from cache, and force it to get a new file each time)
    • append the formed SCRIPT tag to the document (another thing to note here is how browsers behave : each time they “see” an “IMG” or “SCRIPT” tag, they look for the SRC path attribute and perfom a GET operation) – this will trigger the browser to run the javascript.

The imageCraler

var imageCrawler = {
	containerName : 'imageCrawlerContainer',
	GenerateContainer :  function() {
		return '<div class="imageCrawler-container" id="'+imageCrawler.containerName+'"><table><tr> <td> <span>Hey Joe, here are the images :</span></td><td style="width:160px;padding-top:5px;"><a href="javascript:imageCrawler.SaveBookmark();" title="Add to ImageCrawler">Add To ImageCrawler</a></td></tr><tr>'+
	DeleteContainer : function(){
		var element = document.getElementById(imageCrawler.containerName);
		var body = document.getElementsByTagName('body')[0];
	HasJquery : function(){
			var jqueryIsLoaded=jQuery;
			var jQueryIsLoaded=false;

			return true;
			return false;
	SaveBookmark : function() {
		alert('Saving images ....etc..! Done!');
		var t = setTimeout("imageCrawler.DeleteContainer()","2000");
	InsertImages : function(){
    	var imagesArr = document.getElementsByTagName('img');

    	if(imagesArr == null || imagesArr.length==0)
    		return '<tr><td><span style="">No Images found :(</span></td></tr>';
	    	for(var i=0;i<imagesArr.length;i++)
	    		if(imagesArr[i].src != null && imagesArr[i].src != '')
	    			return '<tr><td><img src="'+imagesArr[i].src+'"/></td></tr>';
	InitCrawler : function(){
		 document.body.insertAdjacentHTML('afterBegin', imageCrawler.GenerateContainer());

 	setTimeout("imageCrawler.InitCrawler()", 50);

And to comment a bit the code above:

  • Im creating here an imageCrawler object that has a couple of methods from which the main ones are :
    • GenerateContainer – this will generate the HTML that will be showed to the user the the Bookmark is clicked.
    • InsertImages – this method is the one that “scans” the page for images and builds the strings that will be displayed to the user.
    • the last 3 rows represent a self executing function that will trigger the InitCrawler to show the HTML.

To test this out , you can download the full source over on Codeplex :