# szew/io

File to data and back.

[![szew/io](https://clojars.org/szew/io/latest-version.svg)](https://clojars.org/szew/io)

[API Codox][latest], [CHANGELOG][changelog]

**Version 0.3.0 introduces breaking changes, please see CHANGELOG.**

## Why

I've been dogfooding my private Clo*j*ure toolbox (named `szew`) since 2012.

Splitting out and releasing non-proprietary parts.

## TL;DR

REPL oriented library for reading and writing common file formats.

Two concepts:

- `(Input/in! spec source)` parse source and feed user provided `:processor`,
then return its result, think:

```clojure
(with-open [r (reader source :encoding encoding)]
  (processor (parse r)))))

```

- `(Output/sink spec target)` produces a callable that will consume a sequence,
writing it to target, then return nothing:

```clojure
(with-open [w (writer target :encoding encoding :append append)]
  (doseq [output (unparse a-seq)]
    (.write w output)))

```

You can use that callable as `:processor` to round-trip/convert files.

### Formats

Each constructor carries documentation for its spec, currently these are:

- `Lines`, constructed with `lines`

    * Input: text file in, sequence of Strings out
    * Output Sequence of Strings in, target text file propagated

- `DSV` (D is for Delimiter), constructed with `csv` or `tsv`

    * Input: DSV in, sequence of vectors of Strings out
    * Output: sequence of vectors of Strings in, DSV target propagated

- `FixedWidth`, constructed with `fixed-width`

    * Input: fixed width lines in, sequence of vectors of Strings out
    * Output: sequence of vectors of Strings in, fixed width file propagated

- `XML`, constructed with `xml`

    * Input: XML in, `data.xml/parse` result out
    * Output: data in, `data.xml/emit` put in target

- `EDN`, constructed with `edn`

    * Input: EDN in, vector of read objects out.
    * Output: sequence of objects in, EDN written to file.

- `Files`, constructed with `files`

    * Input: file or directory in, sequence of files out
    * Output: N/A

- `Hasher`, constructed with `hasher`

    * Input: file in, hash out
    * Output: N/A

### Input processing

You prepare a processor that is a data eating function or composition of such
functions. You shove that into a spec, it is then fed data while your source
file open. Just remember to let go of the head if you're short on memory!

```clojure
(require '[szew.io :as io])

(let [proc (partial into [] (comp (drop 2) (take 2)))]
  (println (io/in! (io/tsv {:processor proc}) "input.csv")))

;; => displays vector of third and fourth rows of input.csv


;; Direct call to the spec delegates to in!
(let [proc (partial into [] (comp (drop 2) (take 2)))
      tsv! (io/tsv {:processor proc})]
  (println (tsv! "input.csv")))

;; => displays vector of third and fourth rows of input.csv

```

### Output processing

On the other hand you've got output sink creators. That will accept spec
and path, giving you a callable that will consume a sequence and dump into
target output file.

```clojure
(require '[szew.io :as io])

(let [sink (io/sink (io/csv) "out.csv")]
  (io/in! (io/tsv {:processor sink}) "input.tsv"))

;; => returns nil, converts TSV into CSV

```

## Usage

Simple composed partials:

```clojure
(require '[szew.io :as io])
(require '[szew.io.util :as io.util])

;; A seq of lines from input.txt, processed with composed
;; functions and written to out.txt

(def p (comp (io/sink (io/lines) "out.txt")
             (partial take 10)
             (partial filter true?)
             (partial map #(or % false))
             (partial drop 1)))

(io/in! (io/lines {:processor #'p}) "input.txt")

;; A seq of vectors from in.csv, processed with composed functions
;; and written to out.tsv
(let [adj (io.util/row-adjuster ["default #1" "default #2" "default #3"])
      out (io/sink (io/tsv) "out.tsv")
      pro (comp out
                (partial cons ["col #1" "col #2" "col #3"])
                (partial map adj)
                (partial take 10)
                (partial filter true?)
                (partial map #(or % false))
                (partial drop 1))]
  (io/in! (in/csv {:processor pro, :strict true}) "input.csv"))

```

### Utilities

The `szew.io.util` namespace contains several convenience functions to ease
transitioning between vectors, maps and back. It contains several other
little helpers, like `juxt-map`, `getter`, `friendlify`, `roll-in`
and `roll-out`. There's also `deep-sort` that tries to make maps and
sets ordered recursively. Most importantly it contains `ppmap` and `ppfilter`,
which are parametrized multi-threaded versions of `map` and `filter`, including
transducer functionality.

## Development

From version 0.3.0 `szew.io` depends on Clojure 1.9.0 and `spec.alpha`.

Non-core libraries wrapped:

* [clojure-csv][clojure-csv]
* [org.clojure/data.xml][data.xml]
* [org.clojure/data.zip][data.zip]
* [camel-snake-kebab][csk]

Testing is done with [eftest][eftest].

## License

Copyright © 2012-2019 Sławek Gwizdowski

MIT License, text can be found in the LICENSE file.

[latest]: http://spottr.bitbucket.io/szew-io/latest/
[changelog]: CHANGELOG.md
[clojure-csv]: https://github.com/davidsantiago/clojure-csv
[data.xml]: https://github.com/clojure/data.xml
[data.zip]:https://github.com/clojure/data.zip
[csk]: https://github.com/qerub/camel-snake-kebab
[eftest]: https://github.com/weavejester/eftest
