Archive

Posts Tagged ‘searching’

Using Google’s New Features to Get Twitter Information

May 14th, 2009 Comments off

I like Twitter a lot. I use it every day. Thanks to Twitterdoodle, I am even using Twitter to restart doing the “LittleBuzz” feature that I used to do on ResearchBuzz. But one thing about Twitter really drives me wild, and that, as you might guess, is the search engine.

There are so many brilliant things about Twitter’s advanced search. You can search near places. You can filter your searches by whether a tweet has links. You can filter by a particular date (though with the speed at which Twitter adds material, I seriously wish you could search by a decimal Julian date like 2454966.30038. Wow, that’s off-the-hook nerdy, isn’t it?)

But what you CAN’T do is search by what twittering folks have in their bios, or what they’ve listed as their Web site. I can do a search for twitters within 50 miles of New York City, sure, but I can’t search for people who have “gov” in their Web page address.

At least I can’t on TWITTER. However with a little shaking and baking I can run that search, and many other interesting Twitter searches, on GOOGLE. Furthermore I can use Google’s new “sort by date” option for Web results and get the newest results at the top. Let the goofing around commence!

(Before we get started, a disclaimer: I always assume that an external search engine will never index as much as an internal search engine. Therefore I assume that these numbers are inaccurate, probably incomplete. Furthermore, they shifted as I tested the searches. Don’t take them as gospel, just as Google’s count. Thank you.)

In order to focus Google just on Twitter we’ll have to use Google’s special syntax. Special syntax are just a way to focus Google’s searching attention on a particular site or part of a Web page. The way I use them here should be fairly self-explanatory but if you need additional guidance, check out this page. Okay, let’s start with the basics. To you want to get an idea of how many Twitter profiles have been indexed by Google? Start with this search:

intitle:”on Twitter” site:twitter.com

The search for “On Twitter” in your page result titles just means you’re looking for home pages, not status pages or things like that. Looking for this search in “recent” results gives you 15.3 million results, though a search for results of the past 24 hours “only” finds you 2.4 million results. Yikes.

Well, what about individual Tweets? Pie-simple. Just search for this:

(inurl:status | inurl:statuses) site:twitter.com

With this query you’re telling Google to search the Twitter site for URLs that have “status” or “statuses” in them. (I thought all Tweets were stored under a “status” directory but I saw several with “statuses” instead, so I’m searching for either in the query above. That’s what the pipe ( | ) symbol means.) The count this time? The recent results number is 53.2 million. (Boggle.) The 24-hour results number is 10.1 million.

Thus far I’ve shown you queries without specific keywords, though you could add keywords to those queries if you wanted to (How many Twitter people are named Fred, etc.) But this query works best with a keyword:

(inurl:favorites | inurl:favourites) site:twitter.com keyword

This query searches just those Twitter tweets that have been marked as “favorites” by users — in this case the query is searching for the keyword “keyword”. However you can change that to any word or phrase. Try a username:

(inurl:favorites | inurl:favourites) site:twitter.com mattcutts

Will find Twitterers who have favorited Tweets by or about the famous Googler Matt Cutts.

At the beginning of this article I mentioned that I didn’t like Twitter’s inability to let me search by information in a twitterer’s Web address. You can add that Twitter search to Google by using Google’s wildcard. Google has a wildcard — * — that used to stand for one word. That is, if you used * in a phrase, it would find the phrase with any one word substituted for the wildcard. (Searching for “I am * man” would find “I am Iron Man”, “I am modern man”, “I am old man”, etc.) Now the * is more flexible — it just stands for some space. If you use with Google now, you’re kind of doing a proximity search. (A search for “I am * man” now will find “I am Iron Man” but also “I am an old man” or “I am the real Spider-Man”.)

You can use this wildcard in conjunction with the “Web:” part of Twitter’s profile like this:

intitle:”on Twitter” site:twitter.com “Web * gov * bio”

You’re using the Web: part of the profile and the Bio: part of the profile to set boundaries the area where you want to search. Then you’re adding a couple of wildcards to account for any other words that might be there, then you’re searching for the string “gov” in the Web address, so ideally you’re finding government employees with Twitter accounts. (There were 252 results when I ran this search, which seemed low.) You can also do entire domain names. Let’s look for CNN people:

intitle:”on Twitter” site:twitter.com “Web * cnn.com * bio”

You might have to do some filtering of search results like this if a company domain name is used for multiple things. For example, if you wanted to search for Yahoo people you might also find people who had http://my.yahoo.com as their Web URL. So don’t assume every result you get will automatically be affiliated with the domain name for which you searched.

Google’s new features allow you to slice results into meaningful, timely chunks. Twitter is generating enough data that even searching for narrowly defined time periods generates a LOT of current information. It’s a match made in Heaven. Stay tuned as I make some more attempts to put these two resources together.

Categories: News Tags: , , ,

Information Trapping and Twitter

January 5th, 2009 Comments off

Ohai, I’m back.

Yeah, gone for a while. That darned meatspace. All kinds of stuff can happen and the next thing you know you haven’t put anything on your Web site in six months and people are e-mailing you asking if you’ve lost the keyboard.

But one of my 2009 resolutions was to get back here, since I like doing ResearchBuzz, I’m still crazy about search engines, and I missed ya’ll. There’s probably nobody left — who hangs around an empty RSS feed for months and months? — but if you’re still out there, I did miss you and I’m glad to be back.

I have been doing my Tech Talk thing over at WRAL, so I have been keeping up with my information trapping to a certain extent. But I had not yet delved into Twitter as a way to trap news and information about online search resources. I’ve been playing with it some this evening and wanted to share some conclusions. The Twitter search interface is available at http://search.twitter.com/; I’m Twittering at http://twitter.com/researchbuzz.

The basic Twitter search is simple keyword with the ability to use phrases, exclude words, etc. I tried a couple of sample searches and looked at the RSS feeds, and was struck first by the complete lack of overlap between Twitter and my more traditional new sources. There’s some, but far less than I expected. The second thing I noticed is that I think I’ll be excluding more words than I include; the ability to quickly post and the apparent 15-item limit for Twitter RSS feeds means you really have to work to clamp down the flow and narrow down the kinds of search results you get.

The first thing I figured out is always add -RT to your search, so you don’t get piles of retweets. You’ll still get a few but it gets a lot of noise out of your feed.

The second thing I noticed is that I can take advantage of the patterns of Twittering ‘bots. Searching Twitter for “online library” gets a lot of results from one particular ‘bot, but they’re mostly formatted in the same way so they’re easy to remove.

Third is that you can get a good idea of vocabulary even from one page of search results, and Twitter is tolerant of long queries. So if I want to get news about search engines but not necessarily SEO or rankings and placement, I’m going to have very little luck with “search engine”. I will however do much better with this:

“search engine” -rt -marketing -rankings -myths -optimisation -optimization -visibility -placement

… and even one page of those results only goes back about six hours. But do you see what I mean about excluding more words than I include?

After I’d gotten a feel for what the keyword searches could do I went and took a look at the advanced search options, available at
http://search.twitter.com/advanced. The geographic options are cool, though unfortunately not so useful in the kind of stuff I want to search for. On the other hand, the ability to limit Tweets to only those which have links is very nice (see the checkbox down at the bottom.) There is probably a way to take advantage of the emoticon search but I haven’t figured it out yet. You can also limit the Twitters to those which ask questions.

Searching Twitter is completely backwards from searching a full-text search engine, especially with an eye toward getting a usable and constant flow of information. On a regular search engine, you want to use as many search terms as possible to narrow down your results from a vast ocean of data. In Twitter, there’s still a vast ocean of data, but it’s divided into trillions of drops of water. It’s possible (and from what I’m seeing probably even a better idea) to narrow down with what you DON’T want, instead of trying to guess the right set of keywords from no more than 140 characters at a time.

I’ll be doing more experiments. In the meantime you can check out http://search.twitter.com/operators for a list of Twitter operators and special syntax.