# szew/fi

Index and compare file trees.

[![szew/io](https://clojars.org/szew/fi/latest-version.svg)](https://clojars.org/szew/fi)

[API Codox][1]

## Why

I've been dogfooding my private Clo*j*ure toolbox (named `szew`) since 2012.

Splitting out and releasing non-proprietary parts.

## What

It seems that comparing file trees is hard in bare Windows. Especially when
you're not allowed to install anything or copy those files out anywhere.

It would also be nice to have a way of applying custom comparison heuristics,
one that would make it possible to ignore environment dependent things, like
credentials or host names.

This is the 80/20 solution. It indexes and compares file trees, using paths
relative to point of entry. Produces some crude reports:

* matches (tag v. tag) -- hash comparison list, with settable comparator.
* manifest (tag v. tag) -- difference list with settable comparator.
* diff (tag v. tag) -- single html page with full diff in patch format.

**It's really BASIC, if you can use anything else -- you should do so.**

## Short Example

Compare contents of directories `A` and `B`.

```clojure
(require '[szew.fi :as fi])

(def config (fi/blue-leaves {:db-path    "/home/user/databases/bl"
                             :gz-path    "/home/user/databases/bl.gz"
                             :tags-paths {"A" "datasets/tree_root/A"
                                          "B" "datasets/tree_root/B"}
                             :matches    {["A" "B"] "A_B_matches.tsv"}
                             :manifests  {["A" "B"] "A_B_manifest.tsv"}
                             :html-diffs {["A" "B"] "A_B_diff.html"}}))

(fi/harvest-all! config)
;; => Some time will pass while it lists files
(fi/summary! config)
;; => Prints out some stats
(fi/write-all! config)
;; => Writes out all defined reports
```

You can plugin a special harvester, that can:

* select paths via `:pre-select-fn` callable (`File --> bool`), it will prune
  entire sub-trees if `false` is returned for a directory.
* select files via `:select-fn` callable (`File --> bool`), that will get
  all entries that survived the pruning for a final discrimination.

```clojure
(require '[szew.fi :as fi])

(defn ok-go [_]
  ;; should probably do something serious here with the _ File
  ;; this is basically (constantly true)
  true)

(def config (fi/blue-leaves {:proto-harvest (fi/harvest {:pre-select-fn ok-go
                                                         :select-fn     ok-go})
                             :db-path       "/home/user/databases/bl"
                             :gz-path       "/home/user/databases/bl.gz"
                             :tags-paths    {"A" "datasets/tree_root/A"
                                             "B" "datasets/tree_root/B"}
                             :matches       {["A" "B"] "A_B_matches.tsv"}
                             :manifests     {["A" "B"] "A_B_manifest.tsv"}
                             :html-diffs    {["A" "B"] "A_B_diff.html"}}))

(fi/harvest-all! config)
;; => Some time will pass while it lists files
(fi/summary! config)
;; => Prints out some stats

;; default fi comparator/advisor implementation:
(defn super-good-advice
  "Get tags and entry, give Super Good Advice(TM).
  "
  [entry]
  (cond (and (= "BOTH" (:presence entry)) (empty? (:changed entry)))
        "IGNORE (Reason: No diff)"
        (= (:target_tag entry) (:presence entry))
        "INVESTIGATE (Reason: Only in target)"
        (= (:source_tag entry) (:presence entry))
        "DELETE (Reason: Not in target)"
        :else
        "INVESTIGATE (Reason: Diff in content)"))

;; Run with this advisor
(fi/write-all! config super-good-advice)
;; => Writes out all defined reports with explicit advisor
```

## Logging

This library uses `org.clojure/tools.logging` on top of `logback`. YMMV.

## License

Copyright © 2012-2016 Sławek Gwizdowski

MIT License, text can be found in the LICENSE file.

[1]: http://spottr.bitbucket.org/szew-fi/latest/

