R code in clojisr library can be represented in three main ways:
RObject is clojisr data structure which keeps reference to R objects. Also can act as a function when referenced object is R function. RObject is returned always when R code is executed.
Let's see what is possible in detail.
First, require the necessary namespaces.
(require '[clojisr.v1.rserve :as rserve] '[clojisr.v1.r :as r :refer [r ->code r->clj]])Also, let us make sure we are using a clean session.
(rserve/set-as-default!) (r/discard-all-sessions)To run any R code as string or Clojure form we use clojisr.v1.r/r function
(r "mean(rnorm(100000,mean=1.0,sd=3.0))")[1] 1.004116(r "abc <- runif(1000); f <- function(x) {mean(log(x))}; f(abc)")[1] -0.9599674As mentioned above, every r call creates RObject and R variable which keeps result of the execution.
(def result (r "rnorm(10)"))#'clojisr.v1.codegen-test/result(class result)clojisr.v1.robject.RObject(:object-name result)".MEM$x6ea2ff6ded614851"Let's use the var name string to see what it represents.
(r (:object-name result)) [1] -0.93833942 1.86203423 -0.04160195 0.29983358 -1.30464486 -0.79969858 [7] 0.59744297 -0.87494045 0.64271973 0.76939979Now let us move to discussing the ROBject data type.
Every RObject acts as Clojure reference to an R variable. All these variables are held in an R environment called .MEM. An RObject can represent anything and can be used for further evaluation, even acting as a function if it corresponds to an R function. Here are some examples:
An r-object holding some R data:
(def dataset (r "nhtemp"))#'clojisr.v1.codegen-test/datasetAn r-object holding an R function:
(def function (r "mean"))#'clojisr.v1.codegen-test/functionPrinting the data:
datasetTime Series:Start = 1912 End = 1971 Frequency = 1 [1] 49.9 52.3 49.4 51.1 49.4 47.9 49.8 50.9 49.3 51.9 50.8 49.6 49.3 50.6 48.4[16] 50.7 50.9 50.6 51.5 52.8 51.8 51.1 49.8 50.2 50.4 51.6 51.8 50.9 48.8 51.7[31] 51.0 50.6 51.7 51.5 52.1 51.3 51.0 54.0 51.4 52.7 53.1 54.6 52.0 52.0 50.9[46] 52.6 50.2 52.6 51.6 51.9 50.5 50.9 51.7 51.4 51.7 50.8 51.9 51.8 51.9 53.0Equivalently:
(r dataset)Time Series:Start = 1912 End = 1971 Frequency = 1 [1] 49.9 52.3 49.4 51.1 49.4 47.9 49.8 50.9 49.3 51.9 50.8 49.6 49.3 50.6 48.4[16] 50.7 50.9 50.6 51.5 52.8 51.8 51.1 49.8 50.2 50.4 51.6 51.8 50.9 48.8 51.7[31] 51.0 50.6 51.7 51.5 52.1 51.3 51.0 54.0 51.4 52.7 53.1 54.6 52.0 52.0 50.9[46] 52.6 50.2 52.6 51.6 51.9 50.5 50.9 51.7 51.4 51.7 50.8 51.9 51.8 51.9 53.0We use r->clj to transfer data from R to Clojure (converting an R object to Clojure data):
(->> (r->clj dataset) first (check = 49.9))[:PASSED 49.9]Creating an R object, applying the function to it, and conveting to Clojure data (in this pipeline, both function and r return an RObject):
(->> "c(1,2,3,4,5,6)" r function r->clj (check = [3.5]))[:PASSED [3.5]]Calling R with the code as a string is quite limited. You can't easily inject Clojure data into the code. Also, editor support is very limited for this way of writing. So we enable the use of Clojure forms as a DSL to simplify the construnction of R code.
In generating R code from Clojure forms, clojisr operates on both the var and the symbol level, and can also digest primitive types and basic data structures. There are some special symbols which help in creating R formulas and defining R functions. We will go through all of these in detail.
The ->code function is responsible for turning Clojure forms into R code.
(->> [1 2 4] ->code (check = "c(1,2,4)"))[:PASSED "c(1,2,4)"]When the r function gets an argument that is not a string, it uses ->code behind the scenes to turn that argument into code as a string.
(r [1 2 4])[1] 1 2 4(->> [1 2 4] r r->clj (check = [1.0 2.0 4.0]))[:PASSED [1.0 2.0 4.0]]Equivalently:
(->> [1 2 4] ->code r r->clj (check = [1.0 2.0 4.0]))[:PASSED [1.0 2.0 4.0]](->> (r 1) r->clj (check = [1.0]))[:PASSED [1.0]](->> (r 2.0) r->clj (check = [2.0]))[:PASSED [2.0]](->> (r 3/4) r->clj (check = [0.75]))[:PASSED [0.75]](->> (r true) r->clj (check = [true]))[:PASSED [true]](->> (r false) r->clj (check = [false]))[:PASSED [false]]nil is converted to NULL or NA (in vectors or maps)
(->> (r nil) r->clj (check = nil))[:PASSED nil](->> (->code nil) (check = "NULL"))[:PASSED "NULL"]When you pass a string to r, it is treated as code. So we have to escape double quotes if we actually mean to represent an R string (or an R character object, as it is called in R). However, when string is used inside a more complex form, it is escaped automatically.
(->> (->code "\"this is a string\"") (check = "\"\"this is a string\"\""))[:PASSED "\"\"this is a string\"\""](->> (r "\"this is a string\"") r->clj (check = ["this is a string"]))[:PASSED ["this is a string"]](->> (->code '(paste "this is a string")) (check = "paste(\"this is a string\")"))[:PASSED "paste(\"this is a string\")"](->> (r '(paste "this is a string")) r->clj (check = ["this is a string"]))[:PASSED ["this is a string"]]Any Named Clojure object that is not a String (like a keyword or a symbol) is converted to a R symbol.
(->> (->code :keyword) (check = "keyword"))[:PASSED "keyword"](->> (->code 'symb) (check = "symb"))[:PASSED "symb"]An RObject is converted to a R variable.
(->code (r "1+2"))".MEM$xb6f95379b0ce440d"Date/time is converted to a string.
(->> #inst "2031-02-03T11:22:33" ->code (check = "'2031-02-03 12:22:33'"))[:PASSED "'2031-02-03 12:22:33'"](r #inst "2031-02-03T11:22:33")[1] "2031-02-03 12:22:33"(->> #inst "2031-02-03T11:22:33" r r->clj (check = ["2031-02-03 12:22:33"]))[:PASSED ["2031-02-03 12:22:33"]]A Clojure vector is converted to an R vector created using the c function. That means that nested vectors are flattened. All the values inside are translated to R recursively.
(->> (->code [1 2 3]) (check = "c(1,2,3)"))[:PASSED "c(1,2,3)"](->> (r [[1] [2 [3]]]) r->clj (check = [1.0 2.0 3.0]))[:PASSED [1.0 2.0 3.0]]Some Clojure sequences are interpreted as function calls, if it makes sense for their first element. However, sequences beginning with numbers or strings are treated as vectors.
(r (range 11)) [1] 0 1 2 3 4 5 6 7 8 9 10(r (map str (range 11))) [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"When the first element of a vector or a sequence is a keyword starting with :!, some special conversion takes place.
| keyword | meaning |
|---|---|
:!string | vector of strings |
:!boolean | vector of logicals |
:!int | vector of integers |
:!double | vector of doubles |
:!named | named vector |
:!list | partially named list |
:!ct | vector of POSIXct classes |
:!lt | vector of POSIXlt classes |
nil in a vector is converted to NA
(->> (r [:!string 1 nil 3]) r->clj (check = ["1" nil "3"]))[:PASSED ["1" nil "3"]](r [:!named 1 2 :abc 3]) abc 1 2 3 (r [:!list :a 1 :b [:!list 1 2 :c ["a" "b"]]])$a[1] 1$b$b[[1]][1] 1$b[[2]][1] 2$b$c[1] "a" "b"(->> (r [:!ct #inst "2011-11-01T22:33:11"]) r->clj first long)1320183191(->> (r [:!lt #inst "2011-11-01T22:33:11"]) r->clj){:sec [11.0], :min [33], :hour [23], :mday [1], :mon [10], :year [111], :wday [2], :yday [304], :isdst [0], :zone ["IST"], :gmtoff [##NaN]}When a vector is big enough, it is transfered not directly as code, but as the name of a newly created R variable holding the corresponding vector data, converted via the Java conversion layer.
(->code (range 10000))".MEM$x38cab57765f84b27"(->> (r (conj (range 10000) :!string)) r->clj first (check = "0"))[:PASSED "0"]A Clojue Map is transformed to an R named list. As with vectors, all data elements inside are processed recursively.
(r {:a 1, :b nil})$a[1] 1$b[1] NA(->> (r {:a 1, :b nil, :c [2 3 4]}) r->clj (check = {:a [1.0], :b [nil], :c [2.0 3.0 4.0]}))[:PASSED {:a [1.0], :b [nil], :c [2.0 3.0 4.0]}]Bigger maps are transfered to R variables via the Java conversion layer.
(->code (zipmap (map #(str "key" %) (range 100)) (range 1000 1100)))".MEM$xb9a72a2da2bf4796"(->> (r (zipmap (map #(str "key" %) (range 100)) (range 1000 1100))) r->clj :key23 (check = [1023]))[:PASSED [1023]]Now we come to the most important part, using sequences to represent function calls. One way to do that is using a list, where the first element is a symbol corresponding to the name of an R function, or an RObject corresponding to an R function. To create a function call we use the same structure as in clojure. The two examples below are are equivalent.
Recall that symbols are converted to R variable names on the R side.
(r "mean(c(1,2,3))")[1] 2(r '(mean [1 2 3]))[1] 2(->> (->code '(mean [1 2 3])) (check = "mean(c(1,2,3))"))[:PASSED "mean(c(1,2,3))"]Here is another example.
(r '(<- x (mean [1 2 3])))[1] 2(->> (r 'x) r->clj (check = [2.0]))[:PASSED [2.0]]Here is another example.
Recall that RObjects are converted to the names of the corresponding R objects.
(-> (list (r 'median) [1 2 4]) ->code)".MEM$xe78d516115024fed(c(1,2,4))"(->> (list (r 'median) [1 2 4]) r r->clj (check = [2.0]))[:PASSED [2.0]]There are some special symbols which get a special meaning on,:
| symbol | meaning |
|---|---|
function | R function definition |
tilde or formula | R formula |
colon | colon (:) |
bra | [ |
brabra | [[ |
bra<- | [<- |
brabra<- | [[<- |
To define a function, use the function symbol with a following vector of argument names, and then the body. Arguments are treated as a partially named list.
(r '(<- stat (function [x :median false ...] (ifelse median (median x ...) (mean x ...)))))function (x, median = FALSE, ...) { ifelse(median, median(x, ...), mean(x, ...))}(->> (r '(stat [100 33 22 44 55])) r->clj (check = [50.8]))[:PASSED [50.8]](->> (r '(stat [100 33 22 44 55] :median true)) r->clj (check = [44.0]))[:PASSED [44.0]](->> (r '(stat [100 33 22 44 55 nil])) r->clj first (check #(Double/isNaN %)))[:PASSED ##NaN](->> (r '(stat [100 33 22 44 55 nil] :na.rm true)) r->clj (check = [50.8]))[:PASSED [50.8]]To create an R formula, use tilde or formula with two arguments, for the left and right sides (to skip one, just use nil).
(r '(formula y x))y ~ x(r '(formula y (| (+ a b c d) e)))y ~ a + b + c + d | e(r '(formula nil (| x y)))~x | y(->code '(+ 1 2 3 4 5))"((((1+2)+3)+4)+5)"(->code '(/ 1 2 3 4 5))"((((1/2)/3)/4)/5)"(->code '(- [1 2 3]))"-c(1,2,3)"(->code '(<- a b c 123))"a<-b<-c<-123"(->code '($ a b c d))"(((a$b)$c)$d)"Sometimes we want to use objects created outside our form (defined earlier or in let). For this case you can use the unqote (~) symbol. There are two options:
', unqote evaluates the uquoted form using eval. eval has some constrains, the most important is that local bindings (let bindings) can't be use.(def v (r '(+ 1 2 3 4))) (r '(* 22.0 ~v))[1] 220(let [local-v (r '(+ 1 2 3 4)) local-list [4 5 6]] (r `(* 22.0 ~local-v ~@local-list)))[1] 26400You are not limited to the use code forms. When an RObject correspinds to an R function, it can be used and called as normal Clojure functions.
(def square (r '(function [x] (* x x))))#'clojisr.v1.codegen-test/square(->> (square 123) r->clj first (check = 15129.0))[:PASSED 15129.0]