July 2014

You are browsing the site archives for July 2014.

I’ve started working with Elasticsearch and since I know Lucene internals quite well, I want to have a look at the files that Elasticsearch generates.

There’s a tool called Luke (Lucene Index Toolbox) to do that. There are multiple repositories with Luke:

  1. Original (abandoned?)
  2. Dmitry Kan Fork
  3. Ľuboš Koščo Fork

But since Elasticsearch uses it’s own codecs, it’s not possible to directly open the index with one of these Lukes.

To overcome this, the codecs need to be added to Luke.

I found an easy tutorial for this on Ross Simpsons blog.

Here’s a copy of the how-to:
1. Clone Dmitry’s Mavenized fork:

$ git clone https://github.com/DmitryKey/luke/

2. Add a dependency on your required version of Elasticsearch to the Luke project’s pom file:



    org.elasticsearch
    elasticsearch
    1.3.0

3. Compile the Luke jar file (creates target/luke-with-deps.jar):

$ mvn package

4. Unpack Luke’s list of known postings formats to a temporary file:

$ unzip target/luke-with-deps.jar META-INF/services/org.apache.lucene.codecs.PostingsFormat -d ./tmp/
Archive:  target/luke-with-deps.jar
  inflating: ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat

5. Add the ElasticSearch postings formats to the temp file:

$ echo "org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat" \
    >> ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat
$ echo "org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat" \
    >> ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat
$ echo "org.elasticsearch.search.suggest.completion.Completion090PostingsFormat" \
    >> ./tmp/META-INF/services/org.apache.lucene.codecs.PostingsFormat\

6. Repack the modified file back into the jar:

$ jar -uf target/luke-with-deps.jar -C tmp/ META-INF/services/org.apache.lucene.codecs.PostingsFormat

7. Run Luke

$ ./luke.sh

That’s it.
And for all the lazy people out there, here’s the prepared Luke 4.9.0 with dependencies for Elasticsearch 1.3.0 (luke-with-deps-es-4.9.0).