clojisr.v1.tutorial-test

clojisr.v1.tutorial-test - created by notespace, Mon Feb 24 16:57:38 CET 2020.
Checks: 58 PASSED
Table of contents

Clojisr tutorial

Basic examples

Let us start by some basic usage examples of Clojisr.

(require
   '[clojisr.v1.r :as r :refer
     [r eval-r->java r->java java->r
      java->clj java->naive-clj
      clj->java r->clj clj->r ->code r+
      colon]]
   '[clojisr.v1.require :refer
     [require-r]]
   '[clojisr.v1.robject :as robject]
   '[clojisr.v1.session :as session]
   '[clojisr.v1.rserve :as rserve]
   '[tech.ml.dataset :as dataset])

First, let us make sure that we use the Rserve backend (in case we were using Renjin instead earlier), and that there are no R sessions currently running.

(rserve/set-as-default!)
 (r/discard-all-sessions)

Now let us run some R code, and keep a Clojure handle to the return value.

(def x (r "1+2"))

Convert the R to Clojure: This part requires more thorough documentation.

(->> x
      r->clj
      (check = [3.0]))
[:PASSED [3.0]]
(->>
   "list(A=1,B=2,'#123strange ()'=3)"
   r
   r->clj
   (check =
          {:A [1.0],
           :B [2.0],
           "#123strange ()"
             [3.0]}))
[:PASSED {:A [1.0], :B [2.0], "#123strange ()" [3.0]}]

Run some code on a separate session (specified Rserve port, rather than the default one).

(-> "1+2"
     (r :session-args {:port 4444})
     r->clj
     (->> (check = [3.0])))
[:PASSED [3.0]]

Convert Clojure data to R data. Note that nil is turned to NA.

(-> [1 nil 3]
     clj->r)
[1]  1 NA  3

Functions

An R function is also a Clojure function.

(def f (r "function(x) x*10"))

Let us apply it to Clojure data (implicitly converting that data to R).

(->> 5
      f
      r->clj
      (check = [50.0]))
[:PASSED [50.0]]

We can also apply it to R data.

(->> "5*5"
      r
      f
      r->clj
      (check = [250.0]))
[:PASSED [250.0]]

Functions can get named arguments. Here we pass the na.rm argument, that tells R whether to remove missing values whenn computing the mean.

(->> ((r "mean") [1 nil 3] :na.rm true)
      r->clj
      (check = [2.0]))
[:PASSED [2.0]]

An alternative call syntax:

(->> ((r "mean")
        [1 nil 3]
        [:= :na.rm true])
      r->clj
      (check = [2.0]))
[:PASSED [2.0]]

Anoter example:

(let
   [f (r
        "function(w,x,y=10,z=20) w+x+y+z")]
   (->> [(f 1 2) (f 1 2 :y 100)
         (f 1 2 :z 100)]
        (map r->clj)
        (check =
               [[33.0] [123.0]
                [113.0]])))
[:PASSED ([33.0] [123.0] [113.0])]

Some functions are already created in Clojisr and given special names for convenience. For example:

(->> (r+ 1 2 3)
      r->clj
      (check = [6]))
[:PASSED [6]]
(->> (colon 0 9)
      r->clj
      (check = (range 10)))
[:PASSED [0 1 2 3 4 5 6 7 8 9]]

R dataframes and tech.ml.dataset datasets

Create a tech.ml.dataset dataset object, pass it to an R function to compute the row means, and convert the return value to Clojure.

(let
   [row-means
      (r
        "function(data) rowMeans(data)")]
   (->> {:x [1 2 3], :y [4 5 6]}
        dataset/name-values-seq->dataset
        row-means
        r->clj
        (check = [2.5 3.5 4.5])))
[:PASSED [2.5 3.5 4.5]]

Load the R package 'dplyr' (assuming it is installed).

(r "library(dplyr)")

Use dplyr to process some Clojure dataset, and convert back to the resulting dataset.

(let
   [filter-by-x
      (r
        "function(data) filter(data, x>=2)")
    add-z-column
      (r
        "function(data) mutate(data, z=x+y)")]
   (->>
     {:x [1 2 3], :y [4 5 6]}
     dataset/name-values-seq->dataset
     filter-by-x
     add-z-column
     r->clj
     (check
       (fn [d]
         (->
           d
           dataset/->flyweight
           (= [{:x 2.0, :y 5.0, :z 7.0}
               {:x 3.0,
                :y 6.0,
                :z 9.0}]))))))
[:PASSED _unnamed [2 3]:

|    :x |    :y |    :z |
|-------+-------+-------|
| 2.000 | 5.000 | 7.000 |
| 3.000 | 6.000 | 9.000 |
]

Tibbles are also supported, as a special case of data frames.

(r "library(tibble)")
(let [tibble (r "tibble")]
   (tibble :x [1 2 3] :y [4 5 6]))
# A tibble: 3 x 2
      x     y
   
1     1     4
2     2     5
3     3     6
(let [tibble (r "tibble")]
   (->> (tibble :x [1 2 3] :y [4 5 6])
        r->clj
        dataset/->flyweight
        (check =
               [{:x 1.0, :y 4.0}
                {:x 2.0, :y 5.0}
                {:x 3.0, :y 6.0}])))
[:PASSED ({:x 1.0, :y 4.0} {:x 2.0, :y 5.0} {:x 3.0, :y 6.0})]

R objects

Clojisr holds handles to R objects, that are stored in memory at the R session, where they are assigned random names.

(def one+two (r "1+2"))
(->> one+two
      class
      (check
        =
        clojisr.v1.robject.RObject))
[:PASSED clojisr.v1.robject.RObject]
(:object-name one+two)
"x0294873826bb4b53"

We can figure out the place in R memory corresponding to an object's name.

(->
   one+two
   :object-name
   clojisr.v1.objects-memory/object-name->memory-place)
".MEM$x0294873826bb4b53"

Generating code

Let us see the code-generation mechanism of Clojisr, and the rules defining it.

We will need a reference to the R session:

(def session
   (session/fetch-or-make nil))

For the following examples, we will use some dummy handles to R objects:

(def x
   (robject/->RObject "x"
                      session
                      nil
                      nil))
 (def y
   (robject/->RObject "y"
                      session
                      nil
                      nil))

.. and some real handles to R objects:

(def minus-eleven (r "-11"))
 (def abs (r "abs"))

For an r-object, we generate the code with that object's location in the R session memory.

(->> x
      ->code
      (check = ".MEM$x"))
[:PASSED ".MEM$x"]

For a clojure value, we implicitly convert to an R object, generating the corresponding code.

(->> "hello"
      ->code
      (check re-matches #"\.MEM\$.*"))
[:PASSED ".MEM$xf5ae030e5ad644bb"]

For a symbol, we generate the code with the corresponding R symbol.

(->code 'x)
"x"

A sequential structure (list, vector, etc.) can be interpreted as a compound expression, for which code generation is defined accorting to the first list element.

For a list beginning with the symbol 'function, we generate an R function definition.

(->> '(function [x y] x)
      ->code
      (check = "function(x, y) {x}"))
[:PASSED "function(x, y) {x}"]

For a vector instead of list, we heve the same behaviour.

(->> '[function [x y] x]
      ->code
      (check = "function(x, y) {x}"))
[:PASSED "function(x, y) {x}"]

For a list beginning with the symbol 'tilde, we generate an R ~-furmula.

(->> '(tilde x y)
      ->code
      (check = "(x ~ y)"))
[:PASSED "(x ~ y)"]

For a list beginning with a symbol known to be a binary operator, we generate the code with that operator between all arguments.

(->> '(+ x y z)
      ->code
      (check = "(x + y + z)"))
[:PASSED "(x + y + z)"]

For a list beginning with another symbol, we generate a function call with that symbol as the function name.

(->> '(f x)
      ->code
      (check = "f(x)"))
[:PASSED "f(x)"]

For a list beginning with an R object that is a function, we generate a function call with that object as the function.

(->> [abs 'x]
      ->code
      (check re-matches
             #"\.MEM\$.*\(x\)"))
[:PASSED ".MEM$xef682f053fe043d9(x)"]

All other sequential things (that is, those not beginning with a symbol or R function) are intepreted as data, converted implicitly to R data.

(->> [abs '(1 2 3)]
      ->code
      (check
        re-matches
        #"\.MEM\$.*\(\.MEM\$.*\)"))
[:PASSED ".MEM$xef682f053fe043d9(.MEM$xb7ca9a24cd4e48ef)"]

Some more examples, showing how these rules compose:

(->code '(function [x y] (f y)))
"function(x, y) {f(y)}"
(->code '(function [x y] (+ x y)))
"function(x, y) {(x + y)}"
(->code ['function '[x y] ['+ 'x y]])
"function(x, y) {(x + .MEM$y)}"
(->code
   '(function [x y] (print x) (f x)))
"function(x, y) {print(x); f(x)}"
(->code ['function '[x y] [abs 'x]])
"function(x, y) {.MEM$xef682f053fe043d9(x)}"
(->code [abs minus-eleven])
".MEM$xef682f053fe043d9(.MEM$x0b5f4904bfcd487e)"
(->code [abs -11])
".MEM$xef682f053fe043d9(.MEM$x1a18b3529b1349e7)"

Running generated code

Clojure forms can be run as R code. For example:

(->> [abs (range -3 0)]
      r
      r->clj
      (check = [3 2 1]))
[:PASSED [3 2 1]]

Let us repeat the basic examples from the beginning of this tutorial, this time generating code rather than writing it as Strings.

(def x (r '(+ 1 2)))
"checking again... "
 (->> x
      r->clj
      (check = [3]))
[:PASSED [3]]
(def f (r '(function [x] (* x 10))))
"checking again... "
 (->> 5
      f
      r->clj
      (check = [50]))
[:PASSED [50]]
"checking again... "
 (->> "5*5"
      r
      f
      r->clj
      (check = [250.0]))
[:PASSED [250.0]]
(let [row-means (r '(function
                       [data]
                       (rowMeans
                         data)))]
   (->> {:x [1 2 3], :y [4 5 6]}
        dataset/name-values-seq->dataset
        row-means
        r->clj
        (check = [2.5 3.5 4.5])))
[:PASSED [2.5 3.5 4.5]]
(r '(library dplyr))
(let [filter-by-x (r '(function
                         [data]
                         (filter data
                           (>= x 2))))
       add-z-column
         (r '(function
               [data]
               (mutate data
                       (= z (+ x y)))))]
   (->> {:x [1 2 3], :y [4 5 6]}
        dataset/name-values-seq->dataset
        filter-by-x
        add-z-column
        r->clj))
_unnamed [2 3]:

|    :x |    :y | (z = (x + y)) |
|-------+-------+---------------|
| 2.000 | 5.000 |         7.000 |
| 3.000 | 6.000 |         9.000 |

The strange column name is due to dplyr's mutate behaviour when extra parens are added to the expression.

Requiring R packages

Sometimes, we want to bring to the Clojure world functions and data from R packages. Here, we try to follow the require-python syntax of libpython-clj (though currently in a less sophisticated way.)

(require-r '[stats :as statz :refer
              [median]])
(->> [1 2 3]
      r.stats/median
      r->clj
      (check = [2]))
[:PASSED [2]]
(->> [1 2 3]
      statz/median
      r->clj
      (check = [2]))
[:PASSED [2]]
(->> [1 2 3]
      median
      r->clj
      (check = [2]))
[:PASSED [2]]
(require-r '[datasets :as datasetz
              :refer [euro]])
(->> [r.datasets/euro datasetz/euro
       euro]
      (check apply =))
[:PASSED
 [        ATS         BEF         DEM         ESP         FIM         FRF 
  13.760300   40.339900    1.955830  166.386000    5.945730    6.559570 
        IEP         ITL         LUF         NLG         PTE 
   0.787564 1936.270000   40.339900    2.203710  200.482000 

          ATS         BEF         DEM         ESP         FIM         FRF 
  13.760300   40.339900    1.955830  166.386000    5.945730    6.559570 
        IEP         ITL         LUF         NLG         PTE 
   0.787564 1936.270000   40.339900    2.203710  200.482000 

          ATS         BEF         DEM         ESP         FIM         FRF 
  13.760300   40.339900    1.955830  166.386000    5.945730    6.559570 
        IEP         ITL         LUF         NLG         PTE 
   0.787564 1936.270000   40.339900    2.203710  200.482000 
]]
(require-r '[base :refer [$]])
(-> {:a 1, :b 2}
     ($ 'a)
     r->clj
     (->> (check = [1])))
[:PASSED [1]]

Data visualization

Functions creating R plots or any plotting objects generated by various R libraries can be wrapped in a way that returns an SVG, BufferedImage or can be saved to a file. All of them accept additional parameters specified in grDevices R package.

Currently there is a bug that sometimes causes axes and labels to disappear when rendered inside a larger HTML.

(require-r '[graphics :refer
              [plot hist]])
 (require-r '[ggplot2 :refer
              [ggplot aes geom_point
               xlab ylab labs]])
 (require
   '[clojisr.v1.applications.plotting
     :refer
     [plot->svg plot->file
      plot->buffered-image]])

First example, simple plotting function as SVG string.

(plot->svg (fn []
              (->> rand
                   (repeatedly 30)
                   (reductions +)
                   (plot :xlab "t"
                         :ylab "y"
                         :type "l"))))

ggplot2 plots (or any other plot objects like lattice) can be also turned into SVG.

(plot->svg
   (let [x (repeatedly 99 rand)
         y (map +
             x
             (repeatedly 99 rand))]
     (->
       {:x x, :y y}
       dataset/name-values-seq->dataset
       (ggplot (aes :x x
                    :y y
                    :color '(+ x y)
                    :size '(/ x y)))
       (r+ (geom_point)
           (xlab "x")
           (ylab "y")))))

Any plot (function or object) can be saved to file or converted to BufferedImage object.

(r->clj
   (plot->file
     (str target-path "/histogram.jpg")
     (fn [] (hist [1 1 1 1 2 3 4 5]))
     :width 800
     :height 400
     :quality 50))
{:breaks [1 2 3 4 5],
 :counts [5 1 1 1],
 :density [0.625 0.125 0.125 0.125],
 :mids [1.5 2.5 3.5 4.5],
 :xname [".MEM$xe300244f864840ef"],
 :equidist [true]}
(plot->buffered-image
   (fn [] (hist [1 1 1 1 2 3 4 5]))
   :width 222
   :height 149)
#object[java.awt.image.BufferedImage 0x6f7469ca "BufferedImage@6f7469ca: type = 2 DirectColorModel: rmask=ff0000 gmask=ff00 bmask=ff amask=ff000000 IntegerInterleavedRaster: width = 222 height = 149 #Bands = 4 xOff = 0 yOff = 0 dataOffset[0] 0"]

Intermediary representation as Java objects.

Clojisr relies on the fact of an intemediary representation of java, as Java objects. This is usually hidden from the user, but may be useful sometimes. In the current implementation, this is based on REngine.

(import
   (org.rosuda.REngine REXP
                       REXPInteger
                       REXPDouble))

We can convert data between R and Java.

(->> "1:9"
      r
      r->java
      class
      (check = REXPInteger))
[:PASSED org.rosuda.REngine.REXPInteger]
(->> (REXPInteger. 1)
      java->r
      r->clj
      (check = [1]))
[:PASSED [1]]

We can further convert data from the java representation to Clojure.

(->> "1:9"
      r
      r->java
      java->clj
      (check = (range 1 10)))
[:PASSED [1 2 3 4 5 6 7 8 9]]

On the opposite direction, we can also convert Clojure data into the Java represenattion.

(->> (range 1 10)
      clj->java
      class
      (check = REXPInteger))
[:PASSED org.rosuda.REngine.REXPInteger]
(->> (range 1 10)
      clj->java
      java->clj
      (check = (range 1 10)))
[:PASSED [1 2 3 4 5 6 7 8 9]]

There is an alternative way of conversion from Java to Clojure, naively converting the internal Java representation to a Clojure data structure. It can be handy when one wants to have plain access to all the metadata (R attributes), etc.

(->> "1:9"
      r
      r->java
      java->naive-clj)
{:attr nil, :value [1, 2, 3, 4, 5, 6, 7, 8, 9]}
(->>
   "data.frame(x=1:3,y=factor('a','a','b'))"
   r
   r->java
   java->naive-clj)
{:attr
 {:names ["x", "y"],
  :row.names [-2147483648, -3],
  :class ["data.frame"]},
 :value {:x [1, 2, 3], :y ["b", "b", "b"]}}

We can evaluate R code and immediately return the result as a java object, without ever creating a handle to an R object holding the result:

(->> "1+2"
      eval-r->java
      class
      (check = REXPDouble))
[:PASSED org.rosuda.REngine.REXPDouble]
(->> "1+2"
      eval-r->java
      (.asDoubles)
      vec
      (check = [3.0]))
[:PASSED [3.0]]

More data conversion examples

Convertion between R and Clojure always passes through Java. To stress this, we write it explicitly in the following examples.

(->> "list(a=1:2,b='hi!')"
      r
      r->java
      java->clj
      (check = {:a [1 2], :b ["hi!"]}))
[:PASSED {:a [1 2], :b ["hi!"]}]
(->>
   "table(c('a','b','a','b','a','b','a','b'), c(1,1,2,2,3,3,1,1))"
   r
   r->java
   java->clj
   (check =
          {["1" "a"] 2,
           ["1" "b"] 2,
           ["2" "a"] 1,
           ["2" "b"] 1,
           ["3" "a"] 1,
           ["3" "b"] 1}))
[:PASSED
 {["1" "a"] 2,
  ["1" "b"] 2,
  ["2" "a"] 1,
  ["2" "b"] 1,
  ["3" "a"] 1,
  ["3" "b"] 1}]
(->> {:a [1 2], :b "hi!"}
      clj->java
      java->r
      r->java
      java->clj
      (check = {:a [1 2], :b ["hi!"]}))
[:PASSED {:a [1 2], :b ["hi!"]}]
(->> {:a [1 2], :b "hi!"}
      clj->java
      java->r
      ((r "deparse"))
      r->java
      java->clj)
["list(a = 1:2, b = \"hi!\")"]

Basic types convertion clj->r->clj

(def clj->r->clj (comp r->clj r))
#'clojisr.v1.tutorial-test/clj->r->clj
(check = (clj->r->clj nil) nil)
[:PASSED nil]
(check =
        (clj->r->clj [10 11])
        [10 11])
[:PASSED [10 11]]
(check =
        (clj->r->clj [10.0 11.0])
        [10.0 11.0])
[:PASSED [10.0 11.0]]
(check =
        (clj->r->clj (list 10.0 11.0))
        [10.0 11.0])
[:PASSED [10.0 11.0]]
(check =
        (clj->r->clj {:a 1, :b 2})
        {:a [1], :b [2]})
[:PASSED {:a [1], :b [2]}]

Various R objects

Named list

(->> (r "list(a=1,b=c(10,20),c='hi!')") ;; named
                                         ;; list
      r->clj
      (check =
             {:a [1.0],
              :b [10.0 20.0],
              :c ["hi!"]}))
[:PASSED {:a [1.0], :b [10.0 20.0], :c ["hi!"]}]

Array of doubles

(->> (r "c(10,20,30)") ;; array of
                        ;; doubles
      r->clj
      (check = [10.0 20.0 30.0]))
[:PASSED [10.0 20.0 30.0]]

Timeseries

(->> (r r.datasets/euro) ;; timeseries
      r->clj
      first
      (check = 13.7603))
[:PASSED 13.7603]

Pairlist

(->> (r.base/formals r.stats/dnorm) ;; pairlist
      r->clj
      keys
      sort
      (check = '(:log :mean :sd :x)))
[:PASSED (:log :mean :sd :x)]

NULL

(->> (r "NULL") ;; null
      r->clj
      (check = nil))
[:PASSED nil]

TRUE/FALSE

(->> (r "TRUE") ;; true/false
      r->clj
      (check = [true]))
[:PASSED [true]]

Inspecting R functions

The mean function is defined to expect arguments x and .... These arguments have no default values (thus, its formals have empty symbols as values):

(->> 'mean
      r.base/formals
      r->clj
      (check =
             {:x (symbol ""),
              :... (symbol "")}))
[:PASSED {:x , :... }]

It is an S3 generic function function, which we can realize by printing it:

(r 'mean)
function (x, ...) 
UseMethod("mean")

So, we can expect possibly more details when inspecting its default implementation. Now, we see some arguments that do have default values.

(->> 'mean.default
      r.base/formals
      r->clj
      (check =
             {:x (symbol ""),
              :trim [0.0],
              :na.rm [false],
              :... (symbol "")}))
[:PASSED {:x , :trim [0.0], :na.rm [false], :... }]

R-function-arglists

As we saw earlier, R functions are Clojure functions. The arglists of functions brought up by require-r match the expected arguments. Here are some examples:

(require-r '[base]
            '[stats]
            '[grDevices])
(->>
   [#'r.base/mean #'r.base/mean-default
    #'r.stats/arima0
    #'r.grDevices/dev-off
    #'r.base/Sys-info
    #'r.base/summary-default
    ;; Primitive functions:
    #'r.base/sin #'r.base/sum]
   (map
     (fn [f]
       (-> f
           meta
           (update :ns
                   (comp symbol str)))))
   (check
     =
     '({:arglists ([x & {:keys [...]}]),
        :name mean,
        :ns r.base}
       {:arglists ([x &
                    {:keys [trim na.rm
                            ...]}]),
        :name mean-default,
        :ns r.base}
       {:arglists
          ([x &
            {:keys [order seasonal xreg
                    include.mean delta
                    transform.pars fixed
                    init method n.cond
                    optim.control]}]),
        :name arima0,
        :ns r.stats}
       {:arglists ([& {:keys [which]}]),
        :name dev-off,
        :ns r.grDevices}
       {:arglists ([]),
        :name Sys-info,
        :ns r.base}
       {:arglists
          ([object &
            {:keys [... digits
                    quantile.type]}]),
        :name summary-default,
        :ns r.base}
       {:arglists ([x]),
        :name sin,
        :ns r.base}
       {:arglists
          ([& {:keys [... na.rm]}]),
        :name sum,
        :ns r.base})))
[:PASSED
 ({:arglists ([x & {:keys [...]}]), :name mean, :ns r.base}
  {:arglists ([x & {:keys [trim na.rm ...]}]),
   :name mean-default,
   :ns r.base}
  {:arglists
   ([x
     &
     {:keys
      [order
       seasonal
       xreg
       include.mean
       delta
       transform.pars
       fixed
       init
       method
       n.cond
       optim.control]}]),
   :name arima0,
   :ns r.stats}
  {:arglists ([& {:keys [which]}]), :name dev-off, :ns r.grDevices}
  {:arglists ([]), :name Sys-info, :ns r.base}
  {:arglists ([object & {:keys [... digits quantile.type]}]),
   :name summary-default,
   :ns r.base}
  {:arglists ([x]), :name sin, :ns r.base}
  {:arglists ([& {:keys [... na.rm]}]), :name sum, :ns r.base})]

Using Renjin

In the followint example, we use a differnt R backend (the pure JVM Renjin) for reading a csv, without changing the default backend (which is the usual R using Rserve).

(require 'clojisr.v1.renjin)
 (let [path "/tmp/data.csv"]
   (spit path "a,b,c\n1,2,3\n4,5,6\n")
   (->
     ['read.csv path]
     (r :session-args
        {:session-type :renjin})
     (r/r->clj :session-args
               {:session-type :renjin})
     (->>
       (check =
              [{:a 1, :b 2, :c 3}
               {:a 4, :b 5, :c 6}]))))
[:PASSED ({:a 1, :b 2, :c 3} {:a 4, :b 5, :c 6})]

Checks: 58 PASSED
clojisr.v1.tutorial-test - created by notespace, Mon Feb 24 16:57:38 CET 2020.