This year, several companies
have arrived on the scene promising to improve on traditional
keyword searching through "meaning-based" search
technology. Their premise is a good one: keyword searching
is "dumb" and can return many irrelevant results.
When I search for the word "portal," I mean web
portal; I don't want to see material about a science fiction
game called Portals When I'm seeking information about "The
Big Tuna," I don't want to go on a fishing expedition.
I just want to know whether Bill Parcells plans to coach
again. (Note to self: suggest this meaning for "Big
Tuna" to the Oingo staff.)
I've recently talked with three companies who are working
on this problem. Two of them, Oingo and Simpli, are working
on developing proprietary lexicons which allow users to
zero in on particular meanings for a given keyword. A third,
ejemoni, is developing more ambitious technology that can
scan the text of a document and analyze the relationships
among words to help in placing documents in specific categories
that describe what they are about overall.
All of these technologies have potentially widespread applications.
As one contributor to Traffick Forums argued, however, at
this stage they don't do a lot that a conscientious searcher
couldn't do for themselves with the simple use of Boolean
operators such as AND or NOT.
Let's take a peek at these three entrants into the meaning-based
search field. We're sure to hear more from them in the future.
Oingo
Oingo is more typical of a Silicon Valley startup than
the other two: it's brash, young, fun, and likely to outwork
you if it can't out-think you. They've got a working product,
and they've got it now. Their team of linguists has built
a large lexicon of common meanings for search terms, and
the company is now offering their technology "open
source" as a front end for any directory or site which
wishes to use it. The default directory being used to demo
the service is the ever-present Open Directory Project.
If you try Oingo, you'll see where they're headed. For
the time being, however, it's not about to replace my favorite
search engines. (Lately, I have been using Google and Ixquick,
two that I find tend to provide highly relevant results
without a whole lot of effort in devising search terms.)
Down the road, however, the Oingo team feels it's only
a matter of time before a major search company finds the
technology useful. This could well be true. Major search
and portal companies today are not shy about adding on a
combination of external technologies to ensure better results.
Go2Net uses Direct Hit (the popularity engine); MSN offers
Looksmart directory results; various others have chosen
the Open Directory for categorized results. Meaning-based
search is going to find its way into the mix one way or
another.
SimpliFind
SimpliFind has the same basic idea as Oingo, but seems
to have a little heavier complement of scientific muscle
on board from the likes of Brown and Princeton Universities.
Its lexicon, called WordNet, was developed over a long period
of time by cognitive and linguistic scientists at Princeton.
A test of the product is satisfying. This technology is
sure to find its way into many databases, and might become
a force on the Internet.
Then again, holes in the database reinforce the fact that
SimpliFind, like Oingo, is going to have to rely on considerable
customer-driven customization and brute force to respond
to very human twists and turns in language, history, commerce,
and popular culture. I searched for "Watergate"
and Simpli came back with "No Meaning Found."
Now there's some social amnesia for you! (Oingo has them
beat on that one, which underscores the fact that high-level
cognitive science alone won't be enough to make this technology
practical.)
One question for the scientists. Will XML (eXtensible Markup
Language) have the potential to make their current approach
irrelevant? Tomorrow's Internet is going to be more than
a question of determining the different meanings for words
in the English dictionary. XML may allow meanings to become
hard-wired to ever more particular contexts, and thus make
search technology ever more useful. Thus we'll be able to
search for documents, companies, publications, products,
people, spare parts, geographic locations, stock prices,
and so on, without seeing all the other junk with similar
keywords. At least that's what I read in The Economist magazine.
ejemoni
At this point, we can only speculate about the power of
ejemoni, another sophisticated startup working on meaning-based
search. Ejemoni is well-financed by an influential angel
investor, and has what some observers believe may be a major
scientific breakthrough on its hands. The core idea appears
to be the ability to find related documents by analyzing
the content of whole documents and placing them into an
overall category similar to the Library of Congress classification
system.
A cool feature that may be made possible with ejemoni's
technology is the ability to highlight a whole paragraph
or even several paragraphs of text, and search for related
documents based on all of the words you highlight. The company
stresses that the algorithm used by ejemoni will not simply
be looking at keyword density but will genuinely analyze
the meanings of documents based on word relationships. Obviously,
there is a lot of potential in a search technology that
works better as you feed it more words. It might even be
able to approximate what Ask Jeeves only pretends to do,
which is to understand your questions! And yes, Jeeves,
ejemoni is already voice-recognition ready, according to
the company.
At this stage, it's too early to see ejemoni in action.
We'll catch up with this one again later.
|