Using Wikipedia to Improve Search Quality
Bill Slawski put up an interesting post over the weekend titled Can Web Search Use Wikipedia to Understand References to Names?. Bill references a paper by Microsoft researcher Silviu Cucerzan. The gist of the paper is that search engines can use Wikipedia as a cross referencing source, to help a search engine understand when it sees a name like “Bush” in a document which Bush is being referred to (George W. Bush, his father, Reggie Bush, or whatever).
In principle, what the paper discusses is how the context of the use of a particular name in a web document can be compared to the context of the use of that name on Wikipedia. Simplistically put, if the reference to “Bush” appears on a site about the New Orleans Saints, the likelihood that it’s about Reggie Bush is quite high. The search engine can use an external reference source, such as Wikipedia, as a method of validation, but trying the various pages on Wikipedia with a last name of Bush, and noting the references in common.
For example, the Wikipedia page and the web page being analyzed probably both use phrases like New Orleans Saints, football, running back, etc. By developing this sense of context, the web page being analyzed can be more properly classified, even if the page never uses the running back’s full name. So if the user searches on Reggie Bush, the search engine will know that the particular web page can be considered as relevant to the query.
It makes for interesting reading, and provides some insight into the types of analysis that search engines perform. What makes this even more intense to think about is that this is just one example of thousands of such scenarios that search engines deal with. It’s a complicated process, indeed.
More about:
The Merkle B2B 2023 Superpowers Index outlines what drives competitive advantage within the business culture and subcultures that are critical to success. It is the indispensable guide for B2B marketers to deliver world-class experiences and keep pace with the dynamic environment. Download Now
The ClicData survey found that various challenges exist that prevent organizations from achieving such gains. These challenges included inaccessible data formats and limited flexibility in displaying data in dashboards. Download Now
The need for fraud prevention in the digital world is critical now more than ever. Why? Thinking about your own behavior, consider how you complete transactions and how this has changed over the last 5 years. Download Now
The need for fraud prevention in the digital world is critical now more than ever. Why? Thinking about your own behavior, consider how you complete transactions and how this has changed over the last 5 years. Download Now