clojisr.v1.codegen-test

clojisr.v1.codegen-test - created by notespace, Sat Mar 21 00:43:30 CET 2020.
Checks: 34 PASSED
Table of contents

R code generation from the Clojure forms

R code in clojisr library can be represented in three main ways:

  • as string containing R code or script
  • as RObject
  • as Clojure form

RObject is clojisr data structure which keeps reference to R objects. Also can act as a function when referenced object is R function. RObject is returned always when R code is executed.

Let's see what is possible in detail.

First, require the necessary namespaces.

(require '[clojisr.v1.rserve :as
            rserve]
          '[clojisr.v1.r :as r :refer
            [r ->code r->clj]])

Also, let us make sure we are using a clean session.

(rserve/set-as-default!)
 (r/discard-all-sessions)

R code as a string

To run any R code as string or Clojure form we use clojisr.v1.r/r function

(r
   "mean(rnorm(100000,mean=1.0,sd=3.0))")
[1] 1.004116
(r
   "abc <- runif(1000);
          f <- function(x) {mean(log(x))};
          f(abc)")
[1] -0.9599674

As mentioned above, every r call creates RObject and R variable which keeps result of the execution.

(def result (r "rnorm(10)"))
#'clojisr.v1.codegen-test/result
(class result)
clojisr.v1.robject.RObject
(:object-name result)
".MEM$x6ea2ff6ded614851"

Let's use the var name string to see what it represents.

(r (:object-name result))
 [1] -0.93833942  1.86203423 -0.04160195  0.29983358 -1.30464486 -0.79969858
 [7]  0.59744297 -0.87494045  0.64271973  0.76939979

Now let us move to discussing the ROBject data type.

RObject

Every RObject acts as Clojure reference to an R variable. All these variables are held in an R environment called .MEM. An RObject can represent anything and can be used for further evaluation, even acting as a function if it corresponds to an R function. Here are some examples:

An r-object holding some R data:

(def dataset (r "nhtemp"))
#'clojisr.v1.codegen-test/dataset

An r-object holding an R function:

(def function (r "mean"))
#'clojisr.v1.codegen-test/function

Printing the data:

dataset
Time Series:
Start = 1912 
End = 1971 
Frequency = 1 
 [1] 49.9 52.3 49.4 51.1 49.4 47.9 49.8 50.9 49.3 51.9 50.8 49.6 49.3 50.6 48.4
[16] 50.7 50.9 50.6 51.5 52.8 51.8 51.1 49.8 50.2 50.4 51.6 51.8 50.9 48.8 51.7
[31] 51.0 50.6 51.7 51.5 52.1 51.3 51.0 54.0 51.4 52.7 53.1 54.6 52.0 52.0 50.9
[46] 52.6 50.2 52.6 51.6 51.9 50.5 50.9 51.7 51.4 51.7 50.8 51.9 51.8 51.9 53.0

Equivalently:

(r dataset)
Time Series:
Start = 1912 
End = 1971 
Frequency = 1 
 [1] 49.9 52.3 49.4 51.1 49.4 47.9 49.8 50.9 49.3 51.9 50.8 49.6 49.3 50.6 48.4
[16] 50.7 50.9 50.6 51.5 52.8 51.8 51.1 49.8 50.2 50.4 51.6 51.8 50.9 48.8 51.7
[31] 51.0 50.6 51.7 51.5 52.1 51.3 51.0 54.0 51.4 52.7 53.1 54.6 52.0 52.0 50.9
[46] 52.6 50.2 52.6 51.6 51.9 50.5 50.9 51.7 51.4 51.7 50.8 51.9 51.8 51.9 53.0

We use r->clj to transfer data from R to Clojure (converting an R object to Clojure data):

(->> (r->clj dataset)
      first
      (check = 49.9))
[:PASSED 49.9]

Creating an R object, applying the function to it, and conveting to Clojure data (in this pipeline, both function and r return an RObject):

(->> "c(1,2,3,4,5,6)"
      r
      function
      r->clj
      (check = [3.5]))
[:PASSED [3.5]]

Clojure forms

Calling R with the code as a string is quite limited. You can't easily inject Clojure data into the code. Also, editor support is very limited for this way of writing. So we enable the use of Clojure forms as a DSL to simplify the construnction of R code.

In generating R code from Clojure forms, clojisr operates on both the var and the symbol level, and can also digest primitive types and basic data structures. There are some special symbols which help in creating R formulas and defining R functions. We will go through all of these in detail.

The ->code function is responsible for turning Clojure forms into R code.

(->> [1 2 4]
      ->code
      (check = "c(1,2,4)"))
[:PASSED "c(1,2,4)"]

When the r function gets an argument that is not a string, it uses ->code behind the scenes to turn that argument into code as a string.

(r [1 2 4])
[1] 1 2 4
(->> [1 2 4]
      r
      r->clj
      (check = [1.0 2.0 4.0]))
[:PASSED [1.0 2.0 4.0]]

Equivalently:

(->> [1 2 4]
      ->code
      r
      r->clj
      (check = [1.0 2.0 4.0]))
[:PASSED [1.0 2.0 4.0]]

Primitive data types

(->> (r 1)
      r->clj
      (check = [1.0]))
[:PASSED [1.0]]
(->> (r 2.0)
      r->clj
      (check = [2.0]))
[:PASSED [2.0]]
(->> (r 3/4)
      r->clj
      (check = [0.75]))
[:PASSED [0.75]]
(->> (r true)
      r->clj
      (check = [true]))
[:PASSED [true]]
(->> (r false)
      r->clj
      (check = [false]))
[:PASSED [false]]

nil is converted to NULL or NA (in vectors or maps)

(->> (r nil)
      r->clj
      (check = nil))
[:PASSED nil]
(->> (->code nil)
      (check = "NULL"))
[:PASSED "NULL"]

When you pass a string to r, it is treated as code. So we have to escape double quotes if we actually mean to represent an R string (or an R character object, as it is called in R). However, when string is used inside a more complex form, it is escaped automatically.

(->> (->code "\"this is a string\"")
      (check
        =
        "\"\"this is a string\"\""))
[:PASSED "\"\"this is a string\"\""]
(->> (r "\"this is a string\"")
      r->clj
      (check = ["this is a string"]))
[:PASSED ["this is a string"]]
(->> (->code '(paste
                 "this is a string"))
      (check
        =
        "paste(\"this is a string\")"))
[:PASSED "paste(\"this is a string\")"]
(->> (r '(paste "this is a string"))
      r->clj
      (check = ["this is a string"]))
[:PASSED ["this is a string"]]

Any Named Clojure object that is not a String (like a keyword or a symbol) is converted to a R symbol.

(->> (->code :keyword)
      (check = "keyword"))
[:PASSED "keyword"]
(->> (->code 'symb)
      (check = "symb"))
[:PASSED "symb"]

An RObject is converted to a R variable.

(->code (r "1+2"))
".MEM$xb6f95379b0ce440d"

Date/time is converted to a string.

(->> #inst "2031-02-03T11:22:33"
      ->code
      (check =
             "'2031-02-03 12:22:33'"))
[:PASSED "'2031-02-03 12:22:33'"]
(r #inst "2031-02-03T11:22:33")
[1] "2031-02-03 12:22:33"
(->> #inst "2031-02-03T11:22:33"
      r
      r->clj
      (check =
             ["2031-02-03 12:22:33"]))
[:PASSED ["2031-02-03 12:22:33"]]

Vectors

A Clojure vector is converted to an R vector created using the c function. That means that nested vectors are flattened. All the values inside are translated to R recursively.

(->> (->code [1 2 3])
      (check = "c(1,2,3)"))
[:PASSED "c(1,2,3)"]
(->> (r [[1] [2 [3]]])
      r->clj
      (check = [1.0 2.0 3.0]))
[:PASSED [1.0 2.0 3.0]]

Some Clojure sequences are interpreted as function calls, if it makes sense for their first element. However, sequences beginning with numbers or strings are treated as vectors.

(r (range 11))
 [1]  0  1  2  3  4  5  6  7  8  9 10
(r (map str (range 11)))
 [1] "0"  "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

Tagged vectors

When the first element of a vector or a sequence is a keyword starting with :!, some special conversion takes place.

keywordmeaning
:!stringvector of strings
:!booleanvector of logicals
:!intvector of integers
:!doublevector of doubles
:!namednamed vector
:!listpartially named list
:!ctvector of POSIXct classes
:!ltvector of POSIXlt classes

nil in a vector is converted to NA

(->> (r [:!string 1 nil 3])
      r->clj
      (check = ["1" nil "3"]))
[:PASSED ["1" nil "3"]]
(r [:!named 1 2 :abc 3])
        abc 
  1   2   3 
(r [:!list :a 1 :b
     [:!list 1 2 :c ["a" "b"]]])
$a
[1] 1

$b
$b[[1]]
[1] 1

$b[[2]]
[1] 2

$b$c
[1] "a" "b"
(->> (r [:!ct
          #inst "2011-11-01T22:33:11"])
      r->clj
      first
      long)
1320183191
(->> (r [:!lt
          #inst "2011-11-01T22:33:11"])
      r->clj)
{:sec [11.0],
 :min [33],
 :hour [23],
 :mday [1],
 :mon [10],
 :year [111],
 :wday [2],
 :yday [304],
 :isdst [0],
 :zone ["IST"],
 :gmtoff [##NaN]}

When a vector is big enough, it is transfered not directly as code, but as the name of a newly created R variable holding the corresponding vector data, converted via the Java conversion layer.

(->code (range 10000))
".MEM$x38cab57765f84b27"
(->> (r (conj (range 10000) :!string))
      r->clj
      first
      (check = "0"))
[:PASSED "0"]

Maps

A Clojue Map is transformed to an R named list. As with vectors, all data elements inside are processed recursively.

(r {:a 1, :b nil})
$a
[1] 1

$b
[1] NA
(->> (r {:a 1, :b nil, :c [2 3 4]})
      r->clj
      (check =
             {:a [1.0],
              :b [nil],
              :c [2.0 3.0 4.0]}))
[:PASSED {:a [1.0], :b [nil], :c [2.0 3.0 4.0]}]

Bigger maps are transfered to R variables via the Java conversion layer.

(->code (zipmap (map #(str "key" %)
                   (range 100))
                 (range 1000 1100)))
".MEM$xb9a72a2da2bf4796"
(->> (r (zipmap (map #(str "key" %)
                   (range 100))
                 (range 1000 1100)))
      r->clj
      :key23
      (check = [1023]))
[:PASSED [1023]]

Calls, operators and special symbols

Now we come to the most important part, using sequences to represent function calls. One way to do that is using a list, where the first element is a symbol corresponding to the name of an R function, or an RObject corresponding to an R function. To create a function call we use the same structure as in clojure. The two examples below are are equivalent.

Recall that symbols are converted to R variable names on the R side.

(r "mean(c(1,2,3))")
[1] 2
(r '(mean [1 2 3]))
[1] 2
(->> (->code '(mean [1 2 3]))
      (check = "mean(c(1,2,3))"))
[:PASSED "mean(c(1,2,3))"]

Here is another example.

(r '(<- x (mean [1 2 3])))
[1] 2
(->> (r 'x)
      r->clj
      (check = [2.0]))
[:PASSED [2.0]]

Here is another example.

Recall that RObjects are converted to the names of the corresponding R objects.

(-> (list (r 'median) [1 2 4])
     ->code)
".MEM$xe78d516115024fed(c(1,2,4))"
(->> (list (r 'median) [1 2 4])
      r
      r->clj
      (check = [2.0]))
[:PASSED [2.0]]

There are some special symbols which get a special meaning on,:

symbolmeaning
functionR function definition
tilde or formulaR formula
coloncolon (:)
bra[
brabra[[
bra<-[<-
brabra<-[[<-

Function definitions

To define a function, use the function symbol with a following vector of argument names, and then the body. Arguments are treated as a partially named list.

(r '(<- stat
         (function [x :median false ...]
                   (ifelse
                     median
                     (median x ...)
                     (mean x ...)))))
function (x, median = FALSE, ...) 
{
    ifelse(median, median(x, ...), mean(x, ...))
}
(->> (r '(stat [100 33 22 44 55]))
      r->clj
      (check = [50.8]))
[:PASSED [50.8]]
(->> (r '(stat [100 33 22 44 55]
                :median
                true))
      r->clj
      (check = [44.0]))
[:PASSED [44.0]]
(->> (r '(stat [100 33 22 44 55 nil]))
      r->clj
      first
      (check #(Double/isNaN %)))
[:PASSED ##NaN]
(->> (r '(stat [100 33 22 44 55 nil]
                :na.rm
                true))
      r->clj
      (check = [50.8]))
[:PASSED [50.8]]

Formulas

To create an R formula, use tilde or formula with two arguments, for the left and right sides (to skip one, just use nil).

(r '(formula y x))
y ~ x
(r '(formula y (| (+ a b c d) e)))
y ~ a + b + c + d | e
(r '(formula nil (| x y)))
~x | y

Operators

(->code '(+ 1 2 3 4 5))
"((((1+2)+3)+4)+5)"
(->code '(/ 1 2 3 4 5))
"((((1/2)/3)/4)/5)"
(->code '(- [1 2 3]))
"-c(1,2,3)"
(->code '(<- a b c 123))
"a<-b<-c<-123"
(->code '($ a b c d))
"(((a$b)$c)$d)"

Unquoting

Sometimes we want to use objects created outside our form (defined earlier or in let). For this case you can use the unqote (~) symbol. There are two options:

  • when using quoting ', unqote evaluates the uquoted form using eval. eval has some constrains, the most important is that local bindings (let bindings) can't be use.
  • when using syntax quoting (backquote `), unqote acts as in clojure macros – all unquoted forms are evaluated instantly.
(def v (r '(+ 1 2 3 4)))
 (r '(* 22.0 ~v))
[1] 220
(let [local-v (r '(+ 1 2 3 4))
       local-list [4 5 6]]
   (r `(* 22.0 ~local-v ~@local-list)))
[1] 26400

Calling R functions

You are not limited to the use code forms. When an RObject correspinds to an R function, it can be used and called as normal Clojure functions.

(def square
   (r '(function [x] (* x x))))
#'clojisr.v1.codegen-test/square
(->> (square 123)
      r->clj
      first
      (check = 15129.0))
[:PASSED 15129.0]

Checks: 34 PASSED
clojisr.v1.codegen-test - created by notespace, Sat Mar 21 00:43:30 CET 2020.