# elasticsearch-bolt

## Native vs REST APIs

This repository contains bolts for doing various elasticsearch operations. Some bolts use the native API, others use the REST API. The native API is preferable because it supports connections to multiple nodes and will round robin through them. It is also faster. However the REST API is battle tested. Bolt names are suffixed with either `-native` or `-rest` depending on which API the bolt uses. For example there are two bolts for writing: `elasticsearch-write-native` and `elasticsearch-write-rest`.

For every bolt there is a function for constructing it from a collector config. For bolts using the REST API the following settings must be in config:
`ELASTICSEARCH_URL` - uri with port 9200.

Example:

```clojure
{"ELASTICSEARCH_URL" "http://127.0.0.1:9200"}
```

For bolts using the native API the following settings must be in config:

`ELASTICSEARCH_HOSTS` - comma delimited IPs with port 9300. Note that the REST API uses port 9200 while native API uses port 9300.
`ELASTICSEARCH_CLUSTER_NAME` - the name of cluster to connect to.

Example:

```clojure
{"ELASTICSEARCH_HOSTS" "111.22.333.444:9300,11.222.555.66:9300"
 "ELASTICSEARCH_CLUSTER_NAME" "elasticsearch"} 
```

## Changelog

* Version 0.2.3
  * Renamed bolts and their corresponding construction functions to reflect which API is used (REST or native)
  * Added `elasticsearch-write-rest` bolt
  * Fixed bug for `elasticsearch-write-native` bolt
  * Fixed bug in bolt construction functions
  * Modified fixture for REST bolt tests to ensure the bolt is making the connection, rather than the fixture
  * Made sure all construction functions are tested

* Version 0.2.22
  * Fixed `mk-es-update-bolt`

* Version 0.2.2
  * Fixed connection issues, round robin now works.
  * Project is public again

* Version 0.2.1
  * Expects configuration field ELASTICSEARCH_CLUSTER_NAME, which the native drivers need to create a connection


### elasticsearch-write

There are two elasticsearch-write bolts: `elasticsearch-write-rest` and `elasticsearch-write-native`. Their behavior is the same. The only difference is the protocol used to do the write (REST vs native).

To use `elasticsearch-write-rest` bolt in a topology:

```clojure
(topology
  {"spout" (spout-spec mock-write-spout)}
  {"bolt" (bolt-spec {"spout" elasticsearch-write-input-fields}
                     (mk-es-write-rest {"ELASTICSEARCH_URL" "http://127.0.0.1:9200"}))})
```

Input tuple:

```clojure
["meta" "index" "doctype" "docs"]
```

- `meta`: untouched to allow passing data through the bolt for further processing downstream where you will need additional information
- `index`: The name of the index to write the document to
- `doctype`: The doctype each document being written belongs to
- `docs`: Collection of hashmaps to be written to elasticsearch

Output tuple:

```
["meta" "result"]
```

- `meta`: Same as the input field `meta` untouched
- `result`: A keyword representing the result of the write `:success`

Behavior:

Documents are written in batch to Elasticsearch and acked. Emitted tuples are anchored by default. If any document fails to write the tuple will be failed. Check the Elasticsearch logs for more detailed information about the failure.

### elasticsearch-update-with-script-rest

Supports the update with script flavor of elasticsearch updates: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-update.html

To use `elasticsearch-update-with-script-rest` bolt in a topology:

```clojure
(topology
  {"spout" (spout-spec mock-write-spout)}
  {"bolt" (bolt-spec {"spout" elasticsearch-update-with-script-input-fields}
                     (mk-es-update-with-script-bolt {"ELASTICSEARCH_URL" "http://127.0.0.1:9200"}))})
```

Input tuple:

```clojure
["meta" "index" "doctype" "id" "script" "params"]
```

- `meta`: untouched to allow passing data through the bolt for further processing downstream where you will need additional information
- `index`: The name of the index to write the document to
- `doctype`: The doctype of document that's getting updated
- `id`: _id in elasticsearch: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-id-field.html
- `script` - the update script in Groovy
- `params` - params that get fed to `script`

Output tuple:

```
["meta" "result"]
```

- `meta`: Same as the input field `meta` untouched
- `result`: A keyword representing the result of the update `:success`

For more on elasctisearch scripting see: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html

Behavior:

The document specified by `id` is updated with `script` and `params` and tuple is acked. Emitted tuple is anchored by default. If any update fails the tuple will be failed. Check the Elasticsearch logs for more detailed information about the failure.

### elasticsearch-update-rest

To use `elasticsearch-update-rest` bolt in a topology:

```clojure
(topology
  {"spout" (spout-spec mock-write-spout)}
  {"bolt" (bolt-spec {"spout" elasticsearch-update-input-fields}
                     (mk-es-update-bolt "http://127.0.0.1:9200"))})
```

Input tuple:

```clojure
["meta" "index" "doctype" "docs"]
```

- `meta`: untouched to allow passing data through the bolt for further processing downstream where you will need additional information
- `index`: The name of the index to write the document to
- `doctype`: The doctype each document being written belongs to
- `docs`: Collection of hashmaps to be written to elasticsearch

Output tuple:

```
["meta" "result"]
```

- `meta`: Same as the input field `meta` untouched
- `result`: A keyword representing the result of the write `:success`

Behavior:

If a document id already exists, the document is updated via merge. If a document does not exist it is inserted.

## Running tests

```
lein test
```
