Building a Better Desktop Search: A JavaFX and Lucene Powered Solution

A powerful desktop search application built with JavaFX that offers Google-like search capabilities for local file systems. The solution combines Apache Lucene for full-text search, Apache Tika for content extraction, and d3js for data visualization. Using a multi-threaded pipes and filters architecture, it efficiently crawls, indexes, and searches files while providing an intuitive JavaFX/HTML5 hybrid interface with interactive sunburst diagrams. The application demonstrates modern software architecture principles and leverages multiple open-source technologies to deliver fast, accurate search results with rich visualization capabilities.

3 Minutes reading time

Behold the masterpiece that AI hallucinated while reading this post:

"Little Search Engine That Could: How JavaFX Made Finding Files Fun Again"

(after I fed it way too many marketing blogs and memes)

Created using DALL-E 3

AI-Generated: Little Search Engine That Could: How JavaFX Made Finding Files Fun Again

Microsoft Windows search is not fast, and it also does not give us good search results. So i thought about writing my own Search Engine for the Desktop. It should crawl the file system, extract the content and meta data and finally should give the same results as Google.

I also wanted to test some new technologies like JavaFX with embedded HTML5, Apache Lucene as a full text search engine, Apache Tika as the content extraction framework and other stuff. But before we dive deep into internals, lets take a look at the frontend:

searchscreenshot3

JavaFXDesktopSearch also comes with a visualization of the current full text index. It provides a clickable Sunburst diagram for this purpose. Basically it looks as follows:

fxdesktopsunburst

Under the hood it uses d3js.org to visualize the Lucene index. Quite nice and fast, just try it out. The project is hosted at github.com/mirkosertic/FXDesktopSearch. FXDesktopSearch is deployed by JavaFX based native installers. The original version was deployed by WebStart, but WebStart support was dropped due to Oracles changes on security policies. Now JavaFXDesktopSearch can be installed by using native installers, and the right Java run-time is also bundled. Checkout the released at Google Drive .

Of course i want to say thank you to In-SideFX for the cool Undecorator tool, which can be found here.

Under the hood

I use a multi threaded pipes and filters architecture for file indexing. The FileSystemCrawler searches for files and puts them on the ContentExtractionQueue. The ContextExtractor takes entries from the ContentExtractionQueue, extracts the content and meta data with Apache Tika and puts the content on the IndexWriterQueue. The LuceneIndexHandler takes content from the IndexWriterQueue and updates the Apache Lucene full text index.

searcharchitecture

The frontend is a JavaFX/HTML5 hybrid. The search result is generated by an embedded Jetty WebServer as HTML5 results. For this purpose Freemarker is used as a templating engine. JSP is no option as it would require compilation, which cannot be done by a JRE. The interesting part is the JavaScript up call from the HTMLresult. Every search result entry has an onclick JavaScript event handler. This event handler calls the DesktopGateway instance, which is provided by the JavaFX application itself. The JavaScript can now open local files by calling the DesktopGateway, which itself delegates to the Java Desktop.getDesktop().open() implementation.

The JavaFX/HTML5 hybrid is a very powerful thing. It enables us to create cool user interfaces with full support of the whole Java stack using the described Gateway approach. Also, the HTML application could be deployed standalone without Desktop interaction, for instance to support mobile devices like tablets or smartphones.

Git revision: 2e692ad