Programming clojure – Recursion and Lazy-seq-阿里云开发者社区

5.1 Functional Programming Concepts

The Six Rules
Although the benefits of FP are compelling, FP is a wholesale change from the imperative programming style that dominates much of the programming world today. Plus, Clojure takes a unique approach to FP that strikes a balance between academic purity and the reality of running well on the JVM. That means there is a lot to learn all at once.
But fear not. If you are new to FP, the following “Six Rules of Clojure FP” will help you on your initial steps toward FP mastery, Clojure style:
1. Avoid direct recursion, 因为JVM不会优化, 直接导致爆stack
    The JVM cannot optimize recursive calls, and Clojure programs that recurse will blow their stack.
2. Use recur when you are producing scalar values or small, fixed sequences
    Clojure will optimize calls that use an explicit recur.
3. When producing large or variable-sized sequences, always be lazy
    (Do not recur.) Then, your callers can consume just the part of the sequence they actually need.

4. Be careful not to realize more of a lazy sequence than you need
5. Know the sequence library
You can often write code without using recur or the lazy APIs at all.
6. Subdivide
Divide even simple-seeming problems into smaller pieces, and you will often find solutions in the sequence library that lead to more general, reusable code.

其实只讲两点,
不要直接使用递归, 对于较小的有限seq使用recur优化, 对较大的或无限seq使用lazy, 并且不要随便realize你不需要的lazy seq部分.

不要闭门造车, 多使用sequence库.

5.2 How to Resolve Recursion

FP大量的使用递归, 当然直接使用递归会有爆stack问题, 所以怎么实现递归就成为一个问题, 对于所有FP都需要解决...
上面的1,2,3,4 rule清晰描述了方法,
要不使用tail recursion, 参考Pratical Cljr – loop/recur

要不使用lazy-seq, 通常你不知道该用哪个的时候, 用lazy总是没错的The Joy of Clojure – Laziness(6.3)

lazy-seq, 不需要recursion一定在tail, 但不是所有的case都可以表示为seq的形式

两种方法的思路其实是一样的, 避免使用stack, 这个取决于自底向上的设计

Functional programs make great use of recursive definitions.

A recursive definition consists of two parts:
• A basis, which explicitly enumerates some members of the sequence
• An induction, which provides rules for combining members of the sequence to produce additional members

Our challenge in this section is converting a recursive definition into working code.
There are many ways you might do this:
• A simple recursion, using a function that calls itself in some way to implement the induction step.
• A tail recursion, using a function only calling itself at the tail end of its execution. Tail recursion enables an important optimization.
• A lazy sequence that eliminates actual recursion and calculates a value later, when it is needed.

Choosing the right approach is important. Implementing a recursive definition poorly can lead to code that performs terribly, consumes all available stack and fails, consumes all available heap and fails, or does all of these. In Clojure, being lazy is often the right approach.

Fibonacci numbers, 以此为例

Named for the Italian mathematician Leonardo (Fibonacci) of Pisa (c.1170 – c.1250), the Fibonacci numbers were actually known to Indian mathematicians as far back as 200 BC. The Fibonacci numbers have many interesting properties, and they crop up again and again in algorithms, data structures, and even biology

The Fibonaccis have a very simple recursive definition:
• Basis: F0, the zeroth Fibonacci number, is zero. F1, the first Fibonacci number, is one.
• Induction: For n > 1, Fn equals Fn−1+Fn−2.

(0 1 1 2 3 5 8 13 21 34)

直接递归

自顶向下的思路, f(n)=f(n-1)+f(n-2), 完全需要stack的支持

(defn stack-consuming-fibo [n]
  (cond
   (= n 0) 0      ;basis
   (= n 1) 1      ;basis
   :else (+ (stack-consuming-fibo (- n 1))    
	    (stack-consuming-fibo (- n 2)))))   ;induction

(stack-consuming-fibo 9)
34

(stack-consuming-fibo 1000000) ;会爆stack
java.lang.StackOverflowError

使用recur, tail-call optimization

直接使用上面的逻辑是无法进行tail-call optimization, 必须修改成自底向上的思路, 你如果直接看下面的递归有些困难, ok, 我换种写法

current = 0
next = 1
for(i =n; i>0; i--)
{

current = next
    next = current + next

}

所谓的尾部优化, 其实就是, 你如果可以写出for循环, 那么编译器就可以将stack调用优化成循环...

所以下面的递归调用, 其实是伪装成递归的循环(因为在FP中你无法直接写循环, 所以必须以递归的形式出现)

(defn recur-fibo [n]  
  (letfn [(fib  ; letfn is like let but is dedicated to letting local functions          
          [current next n] ;fib 3个参数          
           (if (zero? n)            
            current                          
            (recur next (+ current next) (dec n))))] ;tail-call optimization, 用recur替换函数名fib  
  (fib 0 1 n))) 

(recur-fibo 1000000) 
195 ... 208,982 other digits ... 875 ;不会导致爆stack, 因为recur优化, 没有使用stack

为什么要显示的表明recur, 进行tail-call optimization?
The problem here is the JVM. While functional languages such as Haskell can perform TCO, the JVM was not designed for functional languages.
No language that runs directly on the JVM can perform automatic TCO.
The absence of TCO is unfortunate but not a showstopper for functional programs. Clojure provides several pragmatic workarounds:
explicit self-recursion with recur, lazy sequences, and explicit mutual recursion with trampoline.

Lazy, 解决递归问题更好的方案

上面的Fibonacci数问题, 换个思路, 其实求n的Fibonacci 值, 就是在Fibonacci Seq上面取第n个item的值

所以我们可以定义lazy fib-seq, 这边思路和上面是不一样的, 逻辑是怎样生成seq, 更关键的是seq生成的思路必须是自底向上的

这个在FP里面至关重要, 因为只有自底向上的思路是不依赖stack的, 而如果自顶向下的思路一定需要stack的支持

对于lazy-seq为什么可以解决blow stack问题, 参考The Joy of Clojure – Laziness(6.3)

(defn lazy-seq-fibo 
  ([] 
     (concat [0 1] (lazy-seq-fibo 0 1))) ;concat, 两个seq串联 
  ([a b]
     (let [n (+ a b)]    ;算出下个值               
      (lazy-seq         ;lazy body               
	(cons n (lazy-seq-fibo b n)))))) ;cons, item和seq串联

(take 10 (lazy-seq-fibo))
(0 1 1 2 3 5 8 13 21 34)

(rem (nth (lazy-seq-fibo) 1000000) 1000) ;rem求余数, 实际效果取最后3位数字
875

上面的做法已经满足1,2,3,4 rule

By rule 5, you can reuse existing sequence library functions that return lazy sequences.

直接使用Sequence库, 默认返回lazy seq

(defn fibo []
(map first (iterate (fn [[a b]] [b (+ a b)]) [0 1])))

Lazy definitions consume some stack and heap.
But they do not consume resources proportional to the size of an entire (possibly infinite!) sequence. Instead, you choose how many resources to consume when you traverse the sequence.

Coming to Realization, 变现

Lazy sequences consume significant resources only as they are realized, that is, as a portion of the sequence is actually instantiated in memory.
Clojure works hard to be lazy and avoid realizing sequences until it is absolutely necessary.

(def lots-o-fibs (take 1000000000 (fibo))) ;定义很大的fibo, 但并没有执行

(nth lots-o-fibs 100) ;只会计算100的fibo, 后面的不会继续算, 这就是lazy的优势

Most sequence functions return lazy sequences.
If you are not sure whether a function returns a lazy sequence, the function’s documentation string typically will tell you the answer:

(doc take)

-------------------------

clojure.core/take

([n coll])

Returns a lazy seq of the first n items in coll, or all items if there are fewer than n.

The REPL, however, is not lazy.
The printer used by the REPL will, by default, print the entirety of a collection.
That is why we stuffed the first billion Fibonaccis into lots-o-fibs, instead of evaluating them at the REPL. Don’t enter the following at the REPL:

; don't do this

(take 1000000000 (fibo))

为什么REPL不是Lazy?
其实不是说这个时候不lazy, 而是不应该lazy. Lazy是当用不到的时候不去eval, 但是在REPL中执行(fibo), REPL会试图print每个值, 这样相当于force所有的值eval.
所以解决这个问题的方法, 很简单, 只需要限制REPL的默认print次数
(set! *print-length* 10)
(fibo)
(0 1 1 2 3 5 8 13 21 34 ...) ;只会eval10次

Losing Your Head

; holds the head (avoid!)
(def head-fibo (lazy-cat [0 1] (map + head-fibo (rest head-fibo))))

(nth head-fibo 1000000)
java.lang.OutOfMemoryError: Java heap space ;这儿不是stack blew, 是memory blew

原因就是本来lazy seq不停往后eval item的时候, 前面的值是会被回收的, 而这儿hold住head-fibo导致, 自动回收失效, 以至于把内存blew掉...

Unless you want to cache a sequence as you traverse it, you must be careful not to keep a reference to the head of the sequence.

With lazy sequences, losing your head is often a good idea.

5.4 Recursion Revisited

http://www.blogjava.net/killme2008/archive/2010/07/14/326129.html, 参考这个blog, 说的比较清楚

Shortcutting Recursion with Memoization, 记事本

clojure.core/memoize
([f])
Returns a memoized version of a referentially transparent function. The
memoized version of the function keeps a cache of the mapping from arguments
to results and, when calls with the same arguments are repeated often, has
higher performance at the expense of higher memory use.

非常简单的, 其实就是返回memoized版本的function, 会在memory里面buffer之前的结果, 当再次调用的时候可以快速返回.
下面的例子很好的反应出它的用途, 如果没有memoize, 一定会blew stack, 因为这是典型的自上而下的思路, 但是加上memoize就不会出现这种情况, 为什么?
因为Seq的eval顺序是从小到大的, 并且所有的结果都会被buffer, 所以后面的递归其实不用真正做, 因为结果已经buffer了...

(defn fac [n]
          (if (<= n 1)
              1
              (* n (fac (dec n)))))

(def fac (memoize fac))
(def fac-seq (map fac (iterate inc 0)))
(nth fac-seq 10000)

相互递归和trampoline

http://www.blogjava.net/killme2008/archive/2010/08/22/329576.html

相互递归问题, 比较典型的是求奇偶问题, 这种问题怎么解决blew stack问题?

(declare my-odd? my-even?)
(defn my-odd? [n]
      (if (= n 0)
          false
         (my-even? (dec n))))
(defn my-even? [n]
      (if (= n 0)
          true
         (my-odd? (dec n))))

思路, 两个函数无法recur, 所以封装成一个函数, 就可以使用recur

(defn parity [n]
      (loop [n n par 0]
            (if (= n 0)
                par
                (recur (dec n) (- 1 par)))))
user=> (parity 3)
1
user=> (parity 100)
0

但终究这个办法不是好的解决方案, clojure提供trampoline来解决这种问题

Clojure’s trampoline function invokes one of your mutually recursive functions:

(trampoline f & partial-args)

(defn trampoline
  ([f]
     (let [ret (f)]
       (if (fn? ret)
          (recur ret)
          ret)))

trampoline接收一个函数做参数并调用，
如果结果为函数，那么就用recur做尾递归优化，
如果不是，则返回结果，主要就是为了解决相互递归调用堆栈溢出的问题

(trampoline + 1 2) ;函数返回的不是函数是, 和一般函数没有不同, 直接返回结果
3

使用trampoline, 只需要做个小改动, 就是把原来的递归调用的地方, 改成返回匿名函数

(declare my-odd? my-even?)
(defn my-odd? [n]
      (if (= n 0)
          false
         #(my-even? (dec n))))
(defn my-even? [n]
      (if (= n 0)
          true
         #(my-odd? (dec n))))

user=> (trampoline my-odd? 10000000) ;所以这样使用就不会blew stack, 因为trampoline会使用recur, 象跳床一样, 如果是函数就反复弹,至到得到结果
false
user=> (trampoline my-even? 10000)
true

本文章摘自博客园，原文发布日期：2013-02-07

Programming clojure – Recursion and Lazy-seq