Microsoft Windows search is not fast, and it also does not give us good search results. So i thought about writing my own Search Engine for the Desktop. It should crawl the file system, extract the content and meta data and finally should give the same results as Google.
I also wanted to test some new technologies like JavaFX with embedded HTML5, Apache Lucene as a full text search engine, Apache Tika as the content extraction framework and other stuff. But before we dive deep into internals, lets take a look at the frontend:
JavaFXDesktopSearch also comes with a visualization of the current full text index. It provides a clickable Sunburst diagram for this purpose. Basically it looks as follows:
Under the hood it uses d3js.org to visualize the Lucene index. Quite nice and fast, just try it out. The project is hosted at github.com/mirkosertic/FXDesktopSearch. FXDesktopSearch is deployed by JavaFX based native installers. The original version was deployed by WebStart, but WebStart support was dropped due to Oracles changes on security policies. Now JavaFXDesktopSearch can be installed by using native installers, and the right Java run-time is also bundled. Checkout the released at Google Drive .
Of course i want to say thank you to In-SideFX for the cool Undecorator tool, which can be found here.
Under the hood
I use a multi threaded pipes and filters architecture for file indexing. The FileSystemCrawler searches for files and puts them on the ContentExtractionQueue. The ContextExtractor takes entries from the ContentExtractionQueue, extracts the content and meta data with Apache Tika and puts the content on the IndexWriterQueue. The LuceneIndexHandler takes content from the IndexWriterQueue and updates the Apache Lucene full text index.
The frontend is a JavaFX/HTML5 hybrid. The search result is generated by an embedded Jetty WebServer as HTML5 results. For this purpose Freemarker is used as a templating engine. JSP is no option as it would require compilation, which cannot be done by a JRE. The interesting part is the JavaScript up call from the HTMLresult. Every search result entry has an onclick JavaScript event handler. This event handler calls the DesktopGateway instance, which is provided by the JavaFX application itself. The JavaScript can now open local files by calling the DesktopGateway, which itself delegates to the Java Desktop.getDesktop().open() implementation.
The JavaFX/HTML5 hybrid is a very powerful thing. It enables us to create cool user interfaces with full support of the whole Java stack using the described Gateway approach. Also, the HTML application could be deployed standalone without Desktop interaction, for instance to support mobile devices like tablets or smartphones.
Git revision: 2e692ad