I’ve started working with Elasticsearch and since I know Lucene internals quite well, I want to have a look at the files that Elasticsearch generates.
There’s a tool called Luke (Lucene Index Toolbox) to do that. There are multiple repositories with Luke:
- Original (abandoned?)
- Dmitry Kan Fork
- Ľuboš Koščo Fork
But since Elasticsearch uses it’s own codecs, it’s not possible to directly open the index with one of these Lukes.
To overcome this, the codecs need to be added to Luke.
I found an easy tutorial for this on Ross Simpsons blog.
Here’s a copy of the how-to:
1. Clone Dmitry’s Mavenized fork:
$ git clone https://github.com/DmitryKey/luke/
2. Add a dependency on your required version of Elasticsearch to the Luke project’s pom file:
org.elasticsearch elasticsearch 1.3.0
3. Compile the Luke jar file (creates target/luke-with-deps.jar):
$ mvn package
4. Unpack Luke’s list of known postings formats to a temporary file:
$ unzip target/luke-with-deps.jar META-INF/services/org.apache.lucene.codecs.PostingsFormat -d ./tmp/ Archive: target/luke-with-deps.jar inflating: ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat
5. Add the ElasticSearch postings formats to the temp file:
$ echo "org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat" \ >> ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat $ echo "org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat" \ >> ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat $ echo "org.elasticsearch.search.suggest.completion.Completion090PostingsFormat" \ >> ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat\
6. Repack the modified file back into the jar:
$ jar -uf target/luke-with-deps.jar -C tmp/ META-INF/services/org.apache.lucene.codecs.PostingsFormat
7. Run Luke
$ ./luke.sh
That’s it.
And for all the lazy people out there, here’s the prepared Luke 4.9.0 with dependencies for Elasticsearch 1.3.0 (luke-with-deps-es-4.9.0).