ResearchBuzz!
ResearchBuzz Logo
Search Engine News and More Since 1998

Sign up for ResearchBuzz FREE every week by e-mail.

Email address: Privacy Policy

ResearchBuzz:

Get a Feed:



    Add to Google
    Subscribe in Bloglines

Search:

 
Web www.researchbuzz.org

February 25, 2006

Monitor Page Sections Via RSS Feeds With Feed43

Jim Stroud wanted a shout-out if I happened to write up Feed43, so here we go -- WOO HOO JIM!. Thanks for pointing me to Feed43, which answers a need I have felt for a while: "What if I want to monitor a page via an RSS feed, but the page doesn't have an RSS feed?" This site allows you to monitor portions of a Web page and get the results by RSS feed. It's a little like an e-mail pagemonitor offered by the late lamented SpyOnIt. Feed43 is currently in "private beta" (you need an invitation code) at http://feed43.com/. You can watch their blog for developmental updates at http://feed43.blogspot.com/.

While you do need an invitation code to use the service at the moment, apparently you don't need to register to monitor a specific page. After a lot of fitzing around trying to load an Amazon page to monitor I gave up. What Amazon was showing me in the browser and what Feed43 was loading were two entirely different animals. Okay, let's try something else then: how many people subscribe to the ResearchBuzz feed in Bloglines. I'll start with the preview page: http://www.bloglines.com/preview?siteid=938 . What I want to monitor is this phrase: "3,794 subscribers". I want to know whenever that number changes.

Still with me so far? Okay, because this is the tedious part. Feed43 reloads its page with the HTML source of the Bloglines URL in it. Click in the text box containing the HTML source and use your Find command to locate the area of the page which you're trying to monitor. (Having the source open in a separate browser window also makes it easier to go through.) In my case I searched for subscribers and found this code snippet (<> replaced with {} in examples to keep coding intact):

{li}{a href="http://www.bloglines.com/userdir?siteid=938"}{strong}3,794 subscribers{/strong}{/a}{/li}

Beneath the box of HTML source you'll find two more boxes, one for "Global Search Pattern:" and one for "Item (repeatable) Search Pattern". It's a little confusing because it's required that you put something in the second box, BUT NOT the first one. The first box, as I understand it, is for patterns that might occur over and over on a page which you might want to divide into list of RSS feed items. The second box is for a search pattern that'll locate a pattern, not necessarily a repeated pattern, within the page source. In my case I want to match the phrase 3,794 subscribers, so I use {%} in place of that phrase, like so, in the second box:

{li}{a href="http://www.bloglines.com/userdir?siteid=938"}{strong}{%}{/strong}{/a}{/li}

Click on the Extract box and Feed43 will show you what your pattern search will match -- in my case the phrase for which I was searching. (The result I get is this: {%1} = 3,794 subscribers. The result is given as %1 because you have the option to match multiple items within a page.) This is a nice quick test to show whether your pattern search is working or not. Be sure that you match enough text/information that when you get this RSS feed in your browser you'll have a clue what you're looking at (which is why I'm also matching on the word "subscribers" instead of just grabbing the number.)

Okay, you have done ALL THAT. Now, you'll need to provide some information about the feed (ick, all that old Amazon URL information stayed on the feed setup page) and the Item Properties. This is annoying, especially if you're monitoring only a small snippet of data like I am; I wish there were a default option. Just imagine you're reading an item from an RSS feed. What do you read? You read an item title, a link to the content, and in then the content itself. In the case of this example I entered for the item title template "How Many Bloglines Subscribers Does ResearchBuzz Have?", for the link the original URL ( http://www.bloglines.com/preview?siteid=938 ), and for the item content the result of the data extract -- {%1}. Click preview and you'll get an overview of what your feed looks like; in this case my feed looks like this:

How Many Bloglines Subscribers Does ResearchBuzz Have? 
http://www.bloglines.com/preview?siteid=938 
3,794 subscribers 

Finally at the end you'll get the URL for your new feed, the option to protect it (which requires an e-mail and a password), or make it private, and the option to get summary e-mail.

This was a very basic example to extract one snippet of information, but I can imagine using Feed43 for all kinds of things. Setting up pages of interest to monitor in the Yahoo Directory springs to mind, as they have only a limited number of RSS feeds for the directory. Or maybe content of interest on pages which a) offer no RSS feeds and b) offer so many rotating ads, refreshing dates, etc. that they easily hit a lot of false positives in a Web monitor. Really getting under the hood with Feed43 will require extensive help file reading -- how about a few more examples/tutorials on the site, guys? -- but it's worth a look. I don't miss SpyOnIt quite so much anymore....

Posted to Internet-Technology-RSS | TrackBack


Things You Can Do With This Article: