IndustryNew Data Mining Tool Will Let You Make Your Own Private Search Engine

New Data Mining Tool Will Let You Make Your Own Private Search Engine

The brainchild of Dr. Edel Garcia, the Minerazzi project aims to allow anyone to build small, on-topic search indexes. His hope is that anyone can be involved in data mining and learning through discovery by building these search indexes.

MinerazziThe Minnerazzi project is a platform that allows you to build topic-specific search engines without programming knowledge. The brainchild of Dr. Edel Garcia, the Minerazzi project aims to allow anyone to build small, on-topic search indexes. His hope is that anyone, regardless of technical background, can be involved in data mining and learning through discovery by building these search indexes.

The Minerazzi Project was initially intended as an indexing project. When first conceived by Dr. Garcia, it was hosted at the Microsoft Inovation Center of Inter American University of Puerto Rico. However, the project was diluted and changed numerous times. A few weeks after initially presenting the project concept at SES New York 2012, Dr. Garcia moved the project out of the MIC and redesigned it as a self-service search platform.

A little over a year later, the Minerazzi project is in beta testing. With the help of local librarians and developers, Dr. Garcia.

Once an index is built, users can start mining email addresses, phone number and other keywords straight from search result pages. Minerazzi also allows you to identify sets of keywords with common features such as number of occurrences, byte size, etc.

For business, Minerazzi allows an organization to build a small, searchable index relevant to any specific set of data. Things like products and services, market information even a competitor index can be built quickly for employees to search and mine. Such a unique, topic-specific index can be ideal for researchers to store, share and search information.

When released to the public, the service will require users to sign up and open an account. Once that account is open, you can start crawling.

Using it is relatively simple. Pick your vertical – news, sports, etc or use something more meaningful like the local music scene, internal departmental resources and Minerazzi helps you search and index documents on that topic. Minerazzi then crawls the Web in search for your documents, when it finds matches, it adds it to your index. That data can then be searched by friends, clients, co-workers or anyone else with whom share access.

Minerazzi uses 11 different interactive search modes to help control the data that is crawled. Some modes make sense like AND, which includes all terms in your search and OR which will look for documents that match any term specified. There are other search modes like NOT AND, NOR, EXCLUSIVE OR and even PROXIMITY, which allows you to specify a number and two terms in any order that are separated by no more than the number you chose.

The science behind these modes is sound. Looking at two metrics – the ration of AND/OR search results and EXACT/AND results provide some important signals. In addition to helping with mining content from your index, these ratios also provide important clues about the nature of a search engine index and its content.

“In general, we can compute other types of search mode results ratios to extract very useful information,” Garcia said. “With some of these ratios we can estimate the organic/inorganic incompatibility of keywords in a collection.”

Garcia emphasizes that Minerazzi places users at the center of the search experience. Instead of limiting users to a list of results, Minerazzi allows users to interact more with the returned data beyond simply staring down a list of links and clicking.

“In my book, that is a technology waste. It is like sending your eyes to ‘window shopping’ across an oversized digital mall. Boring!” Garcia told Search Engine Watch. “With Minerazzi, users interact at query time with search result pages, extracting information that matter to them, and doing something with that information.”

Minnerazzi is still in beta testing with no official public launch date at this time. Garcia and his team are hoping to have it available within the coming weeks.

Related Articles

Q&A with Microsoft's Noël Reilly: Data, discovery, customer-first mindset

Content Q&A with Microsoft's Noël Reilly: Data, discovery, customer-first mindset

6m Kimberly Collins
What's it like using DuckDuckGo in 2019?

Industry What's it like using DuckDuckGo in 2019?

11m Tereza Litsa
Dragonfly: 500+ staff sign open letter for Google to drop new Chinese search engine

Industry Dragonfly: 500+ staff sign open letter for Google to drop new Chinese search engine

1y Luke Richards
The evolution of search: succeeding in today's digital ecosystem - part 1

Industry The evolution of search: succeeding in today's digital ecosystem - part 1

2y Ric Rodriguez
Search trends 2018: what can marketers learn?

Industry Search trends 2018: what can marketers learn?

2y Jason Tabeling
SEW Interview: Clark Boyd on visual search

Industry SEW Interview: Clark Boyd on visual search

2y Sew Staff
The future of search

Industry The future of search

2y Jessie Moore
Where we’re going, we won’t need websites 

Industry Where we’re going, we won’t need websites 

2y Kevin Gibbons

Resources

SEO for Website Redesign and Migration

SEO SEO for Website Redesign and Migration

6m
Ever-Changing GoalPosts - SEO Challenges and How to Overcome Them

Analytics Ever-Changing GoalPosts - SEO Challenges and How to Overcome Them

6m
7 Unexpected PPC + SEO Strategies for Growth Marketing

Content 7 Unexpected PPC + SEO Strategies for Growth Marketing

6m
All About Click Fraud and How You Can Block It

PPC All About Click Fraud and How You Can Block It

8m