Archive for the ‘Net-Web’ Category.

One of These Pages Is Sorta Like the Other

I don’t know if I would use this tool to find derivative content on Wikipedia, but it’s an interesting way to browse. Similpedia, at http://similpedia.org/index.html , allows you to enter a URL or a block of text and get Wikipedia entries that are similar to what you’ve entered. In the tests I ran, I didn’t get results that were so similar that there was lots of matching text, but I got results that were closely enough related that the searches led to interesting browsing.

So the Idaho Potato Commission Web site is http://www.idahopotato.com/ . I did a search for that URL at Similpedia, and got a list of ten pages. The pages ranged from Almond Potato to Blackfoot, Idaho, to Potato Paradoxes, to Fauxtato. (Fauxtato?) Eight of the results were potato-oriented, with only two results Idaho-oriented (including the page for Idaho’s 2nd congressional district.)

I then took a search from a FAQ on the Web site, the following text:

“Potatoes should not be frozen from a raw state. They will turn black and the texture will be soft upon thawing. Potato dehydrators (who earlier had invented the instant mashed potato) struggled with any attempts to create a frozen French fry by cutting up the potatoes and attempting to freeze them. Frequently they would just turn to water and mush, similar to trying to freeze a fresh onion whole. The scientist that solved all this was Ray Dunlap, who worked for the J.R. Simplot company in Idaho. He discovered that by precooking or blanching the potatoes this stabilized them and made it possible to freeze them to be thawed later without breaking down the cellular structure. This happened in the late forties to early fifties. One of Simplots biggest customers later on was Mc Donald’s which made their fries fresh. As they grew, the labor time and the convenience of using a frozen fry outweighed this signature fry and they initially switched over to Simplot product exclusively. Now frozen fries are the most commonly used form of potatoes for consumer consumption. Potatoes are inexpensive, so either pre cook them and then freeze or toss when you have too many and they have started to have a skin that wrinkles or sprouts.”

… I ran that search on Similpedia. I got nine potato-oriented results, including cooking methods, brands, and that fauxtato thing again. But I also got odd result — the page for Hamburger Station.

The excerpt above is over 200 words, and worked well. However when I tried paragraphs of fewer words — 50 and 75 — I got much more inconsistent (almost useless) results. Blocks of at least a hundred words at a time are recommended.

An interesting browse option, especially if I don’t know a lot about a subject.

Google Noting Fresh Results Are Fresher Than Ever, I Think

I find this really exciting since it wasn’t that long ago that it might take 4-6 weeks for a Web page to get into a search engine. Google has announced that instead of showing a crawl date for search results, search results will show crawls by hours.

At least, it’s supposed to. I ran several searches on Google and GooFresh and did not find any instance where result pages were showing a crawl of x hours ago. Even a search for Official Google Blog didn’t bring me any joy. However, while I was running searches trying to find evidence of what the blog was talking about, I learned something interesting.

I wondered what kind of searches you’d get if you just searched for the date: that is, “August 16 2007″ (at this writing.) So I plugged that in as a search. I got interesting results, but I was looking for fresh, recent content. Instead I was getting Wikipedia’s page for the date, some upcoming events and placeholder pages, and stuff like that.

So, thinking about how many blogs arrange their pages, and how they include dates in the URLs, I added an inurl: syntax to the search so it looked like this:

“August 16 2007″ inurl:2007

And I got a lot of neat blog content, as well as lots of dynamically-generated pages (an unexpected bonus.) Very fresh. (Still didn’t see that new setup Google was mentioning though.)

And just so you don’t accuse me of being an ugly American — I also tried doing a search for “16 August 2007″ inurl:2007 as well. There were fewer results, but they had a definite European/International flavor…

This post came from ResearchBuzz, a site with news and information about online data collections. Visit us at ResearchBuzz.com .

What Keywords Are Being Used On Social Sites?

Let me risk beating a point into the ground: in order to find things when you search you have to use the proper vocabulary. And on the Internet, the proper vocabulary is changing constantly, especially in the arenas of tech and popular culture. SiteVolume, at http://sitevolume.com/, is a nifty little tool letting you know how much vocabulary words are being used across certain social sites like MySpace, Digg, and Flickr.

You can enter up to five words and get a bar graph of how often the words appear on each site. It was interesting comparing the name of Web tools, search engines, and technologies. Especially the word Twitter, which was almost nonexistent some places but which showed up fairly frequently other places (besides Twitter itself, of course.)

The presentation is pretty slick but the methodology behind getting the numbers is pretty simple; it’s a Google search. (You can get more information on the questions and comments page.) If I wanted to do some general searches, but wasn’t sure which keywords to start from, I’d compare their frequency on these sites and perhaps pick the most popular (or the least popular if I was trying to limit my results.) Fun toy.