Efficient Open Source Search Engine
Use Cases and Deployment Scope
Apache Lucene is being used across multiple applications where data keeps updating continuously. Being open source and the efficiency with which the search engine operates, its the perfect solution to implement dynamic text search. The accuracy of the search results is impressive and wouldn't make business sense to implement external products like Google search for a small scale data set.
Pros
- Fast indexing, with proper optimization I can index a Gig of data in 2 mins.
- Easy integration with web crawlers
- Quick and Accurate Results
- Flexible sorting option for results based on the search field and relevance
Cons
- Scalable issues especially when the index grows in size with millions of documents.
- The Boolean scoring model could be better.
- Difficulty setting up on cluster based environment.
Likelihood to Recommend
Apache Lucene is a perfect text search implementation where the heap space usage needs to be kept to its minimal. It also enables search based on various search fields and most importantly the search and index process can happen simultaneously. The only scenario where it might be less appropriate would be when the index size grows too big. We have witnessed few scalable issues where the search would take a while when the index size is too large.
