[Erlang 0119] Erlang OTP 源码阅读指引

  1. 云栖社区>
  2. 博客>
  3. 正文

[Erlang 0119] Erlang OTP 源码阅读指引

唐玄奘 2018-01-03 22:40:43 浏览1326
展开阅读全文

上周Erlang讨论群里面提到lists的++实现,争论大多基于猜测,其实打开代码看一下就都明了.贴出代码截图后有同学问这代码是哪里找的?

  "代码去哪里找?",关于Erlang源码阅读的路线图江湖上只有一份残卷了.我觉得"代码在哪儿?"这类问题是信息不对称造成的,本身难度不大,就像<贫民窟的百万富翁>里面的情节:贾马尔知道市井生活中的零零碎碎却说不出国徽上的文字,我们就从电影中的这一幕开始本文的探索吧
 
 
内景,演播室—夜晚
普瑞姆:这个问题的奖金四千卢比……印度的国徽是三只狮子,狮子下面写的是什么?是否是……
                   A.惟有真理必胜 B.惟有谎言必胜 C.惟有时尚必胜 D.惟有金钱必胜
[普瑞姆假装困惑的样子看向观众,引他们发笑.]
普瑞姆:你觉得是哪一个呢,贾马尔?这是我国历史上最著名的一句话.或许你想给朋友打电话求助吧?
[观众哈哈大笑.一滴汗珠从贾马尔的额头流下来.普瑞姆喜欢贾马尔的不安.]
普瑞姆:或者向现场观众求助?我凭直觉认为他们可能知道答案.你想怎么办?
贾马尔:是的.
普瑞姆(吃惊):什么是的?
贾马尔:求助观众.
[普瑞姆吹口哨.举目望向观众席.]
普瑞姆:那么女士们、先生们,请帮他解难吧.现在请按下你们的选择键.
[灯光转暗.让人紧张的音乐声响起.]

内景,督察办公室—白天
[督察按暂停键.叹了口气.]
督察:贾马尔,我五岁大的女儿都知道答案,你却不知道.这对一个天才百万富翁来说,不是很奇怪吗?怎么回事?你的作弊同伙跑出去撒尿了是吗?又或者是他咳得不够大声?
[沉默.斯里尼瓦斯警员朝贾马尔的椅子踢了一脚.]
斯里尼瓦斯警员:督察问你话呢.
贾马尔:在乔帕蒂海滩吉万的小吃摊上,炸脆饼多少钱?
督察:什么?
贾马尔:一份炸脆饼,多少钱?
斯里尼瓦斯警员(忍不住说):十卢比.
贾马尔:错.排灯节过后就是十五卢比了.上个星期四,是谁在达达尔车站外面偷了瓦尔马警员的自行车?
督察(被逗乐了):你知道是谁偷的?
贾马尔:朱胡区的每个人都知道.连五岁的小孩儿都知道.
 
 归正传,我们从代码下载开始......
 

源码下载

 
Github地址: https://github.com/erlang/otp
官网下载: http://www.erlang.org/download.html 
 
  对于选择了Windows安装包的同学,要特别提示一下:lib目录中包含了对应类库的源码和ebin,比如kernel,stdlib等等,但ERTS目录里面没有对应源码,自己去下载一份来看吧,或者直接在线查看 https://github.com/erlang/otp/tree/maint/erts
 
 

源码阅读工具

 
      Erlang OTP源码量不小,好的工具能帮我们省很多事,比如支持文件夹查找或者项目内搜索的,在代码之间各种跳转更是减少很多麻烦.如果是在Windows环境中Everything这样的工具也是定位文件利器,Visual studio 阅读C代码体验真的很棒,当然了如果你喜欢在纯文本编辑器里面用正则搞,也无不可;下面是在VS中代码截图:
 
 

Overview

 
    大体上,otp_src的代码如下图这样组织的(打开文件夹就可以看到,算不上什么Thirty Thousand Feet).与我们每天写代码最息息相关的是ERTS和lib;ERTS(Erlang Run-Time System)包含了Erlang运行时系统的代码,是Erlang的基础设施.lib包含了所有的外围类库实现,有些类库的安排是违反直觉的,不过习惯了就好了,比如file.erl不是在stdlib而是在kernel;gen_server gen_fsm的代码实现应该是在kernel吧?错,它们的代码是在stdlib下;但是呢,application.erl是在kernel.
 
Kernel
   
   看一下kernel目录,是不是有点摸不着头脑?Erlang运行时是有一个kernel application运行,运行一下appmon我们可以动态看到kernel涉及到的代码模块.我们大致可以揣摩到设计者的规划原则:kernel的范畴包含了application管理,code生命周期管理,IO(文件IO,网络IO,io_request),HIPE,分布式基础设施等等,见下面的思维导图:
 
 
 
  上面的划分方式只是我个人的一种看法,为了方便查阅我把上图转成了文字,见下面:
 

  

stdlib 
 
   相比kernel,stdlib恰如起名包含了绝大多数的功能模块,比如lists,ets,各种数据结构实现,当然最重要的是它包含了OTP的gen_server gen_fsm gen_event supervisor以及幕后英雄proc_lib和sys.如果你不嫌弃,这里有一份略微过时的文档,是我初学Erlang的时候在文档上做的笔记注释:[Erlang STDLIB 中文注释版]
 
 特别值得一提的是shell和shell_default,对Erlang Shell好奇的同学看看这里能找到答案,所谓"EShell里面灵异的问题"也就有了一个合理的解释.
 其它的模块因为功能特别明确很容易定位到,比如专门处理XML的xmerl,数据库mnesia等等,辅之以Google,几乎没有什么障碍;
 
 

Dive into ERTS

 
Atom and bifs
 
  在 https://github.com/erlang/otp/tree/maint/erts/emulator/beam 能看到几个索引文件:
 
atom.names  枚举了ERTS使用的atom,学习一下惯用法还是非常有必要的
bif.tab           bif清单 注意 Use "ubif" for guard BIFs and operators; use "bif" for ordinary BIFs.
 
 
Basic Type
 
打开 https://github.com/erlang/otp/blob/maint/erts/emulator/beam/sys.h

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/*
** Data types:
**
** Eterm: A tagged erlang term (possibly 64 bits)
** BeamInstr: A beam code instruction unit, possibly larger than Eterm, not smaller.
** UInt:  An unsigned integer exactly as large as an Eterm.
** SInt:  A signed integer exactly as large as an eterm and therefor large
**        enough to hold the return value of the signed_val() macro.
** UWord: An unsigned integer at least as large as a void * and also as large
**          or larger than an Eterm
** SWord: A signed integer at least as large as a void * and also as large
**          or larger than an Eterm
** Uint32: An unsigned integer of 32 bits exactly
** Sint32: A signed integer of 32 bits exactly
** Uint16: An unsigned integer of 16 bits exactly
** Sint16: A signed integer of 16 bits exactly.
*/

 

类型转换 https://github.com/erlang/otp/blob/maint/erts/emulator/beam/big.c
 
Erlang Term的构造代码在https://github.com/erlang/otp/blob/maint/erts/emulator/beam/erl_term.h
 
这里我们还能看到一些复杂数据结构的内部表示,比如:
 
 
两个例子
 
看两个例子吧,第一个例子lists的append是如何实现的,很容易找到lists.erl
 
https://github.com/erlang/otp/blob/maint/lib/stdlib/src/lists.erl
 
append(L1, L2) -> L1 ++ L2.
 
我们发现其实append就是使用的++,那++是在哪里实现的呢?
在 https://github.com/erlang/otp/tree/maint/erts/emulator/beam 目录下面,可以看到一系列erl_bif_*.c的文件,这里可以找到对应模块的bif实现.打开
https://github.com/erlang/otp/blob/maint/erts/emulator/beam/erl_bif_lists.c 是不是很快就找到我们想要的代码了?对,就是我上面截图的代码,这里不再重述.
比较有趣的一个地方是这两句:
 
1
2
copy = last = CONS(hp, CAR(list_val(list)), make_list(hp + 2));
list = CDR(list_val(list));

 

 
有同学说,CAR CDR CONS这三个东西好熟悉啊?对,没错,这就是Lisp列表操作的三个基础原语,分别实现取表头,取表头外剩余部分,表构造(constructs),跳转到它们的实现,在erl_term.h:

1
2
3
4
5
#define CONS(hp, car, cdr) \
        (CAR(hp)=(car), CDR(hp)=(cdr), make_list(hp))
 
#define CAR(x)  ((x)[0])
#define CDR(x)  ((x)[1])

  

 
第二个例子 看看process的定义是什么样的
 
首先在 erl_process.h 找到 Process的定义
 
typedef struct process Process;
 
转到struct process的定义:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
struct process {
    ErtsPTabElementCommon common; /* *Need* to be first in struct */
 
    /* All fields in the PCB that differs between different heap
     * architectures, have been moved to the end of this struct to
     * make sure that as few offsets as possible differ. Different
     * offsets between memory architectures in this struct, means that
     * native code have to use functions instead of constants.
     */
 
    Eterm* htop;        /* Heap top */
    Eterm* stop;        /* Stack top */
    Eterm* heap;        /* Heap start */
    Eterm* hend;        /* Heap end */
    Uint heap_sz;       /* Size of heap in words */
    Uint min_heap_size;         /* Minimum size of heap (in words). */
    Uint min_vheap_size;        /* Minimum size of virtual heap (in words). */
 
#if !defined(NO_FPE_SIGNALS) || defined(HIPE)
    volatile unsigned long fp_exception;
#endif
 
#ifdef HIPE
    /* HiPE-specific process fields. Put it early in struct process,
       to enable smaller & faster addressing modes on the x86. */
    struct hipe_process_state hipe;
#endif
 
    /*
     * Saved x registers.
     */
    Uint arity;         /* Number of live argument registers (only valid
                 * when process is *not* running).
                 */
    Eterm* arg_reg;     /* Pointer to argument registers. */
    unsigned max_arg_reg;   /* Maximum number of argument registers available. */
    Eterm def_arg_reg[6];   /* Default array for argument registers. */
 
    BeamInstr* cp;      /* (untagged) Continuation pointer (for threaded code). */
    BeamInstr* i;       /* Program counter for threaded code. */
    Sint catches;       /* Number of catches on stack */
    Sint fcalls;        /*
                 * Number of reductions left to execute.
                 * Only valid for the current process.
                 */
    Uint32 rcount;      /* suspend count */
    int  schedule_count;    /* Times left to reschedule a low prio process */
    Uint reds;          /* No of reductions for this process  */
    Eterm group_leader;     /* Pid in charge
                   (can be boxed) */
    Uint flags;         /* Trap exit, etc (no trace flags anymore) */
    Eterm fvalue;       /* Exit & Throw value (failure reason) */
    Uint freason;       /* Reason for detected failure */
    Eterm ftrace;       /* Latest exception stack trace dump */
 
    Process *next;      /* Pointer to next process in run queue */
 
    struct ErtsNodesMonitor_ *nodes_monitors;
 
    ErtsSuspendMonitor *suspend_monitors; /* Processes suspended by
                         this process via
                         erlang:suspend_process/1 */
 
    ErlMessageQueue msg;    /* Message queue */
 
    union {
    ErtsBifTimer *bif_timers;   /* Bif timers aiming at this process */
    void *terminate;
    } u;
 
    ProcDict  *dictionary;       /* Process dictionary, may be NULL */
 
    Uint seq_trace_clock;
    Uint seq_trace_lastcnt;
    Eterm seq_trace_token;  /* Sequential trace token (tuple size 5 see below) */
 
#ifdef USE_VM_PROBES
    Eterm dt_utag;              /* Place to store the dynamc trace user tag */
    Uint dt_utag_flags;         /* flag field for the dt_utag */
#endif      
    BeamInstr initial[3];   /* Initial module(0), function(1), arity(2), often used instead
                   of pointer to funcinfo instruction, hence the BeamInstr datatype */
    BeamInstr* current;     /* Current Erlang function, part of the funcinfo:
                 * module(0), function(1), arity(2)
                 * (module and functions are tagged atoms;
                 * arity an untagged integer). BeamInstr * because it references code
                 */
     
    /*
     * Information mainly for post-mortem use (erl crash dump).
     */
    Eterm parent;       /* Pid of process that created this process. */
    erts_approx_time_t approx_started; /* Time when started. */
 
    /* This is the place, where all fields that differs between memory
     * architectures, have gone to.
     */
 
    Eterm *high_water;
    Eterm *old_hend;            /* Heap pointers for generational GC. */
    Eterm *old_htop;
    Eterm *old_heap;
    Uint16 gen_gcs;     /* Number of (minor) generational GCs. */
    Uint16 max_gen_gcs;     /* Max minor gen GCs before fullsweep. */
    ErlOffHeap off_heap;    /* Off-heap data updated by copy_struct(). */
    ErlHeapFragment* mbuf;  /* Pointer to message buffer list */
    Uint mbuf_sz;       /* Size of all message buffers */
    ErtsPSD *psd;       /* Rarely used process specific data */
 
    Uint64 bin_vheap_sz;    /* Virtual heap block size for binaries */
    Uint64 bin_vheap_mature;    /* Virtual heap block size for binaries */
    Uint64 bin_old_vheap_sz;    /* Virtual old heap block size for binaries */
    Uint64 bin_old_vheap;   /* Virtual old heap size for binaries */
 
    ErtsProcSysTaskQs *sys_task_qs;
 
    erts_smp_atomic32_t state;  /* Process state flags (see ERTS_PSFLG_*) */
 
#ifdef ERTS_SMP
    ErlMessageInQueue msg_inq;
    ErtsPendExit pending_exit;
    erts_proc_lock_t lock;
    ErtsSchedulerData *scheduler_data;
    Eterm suspendee;
    ErtsPendingSuspend *pending_suspenders;
    erts_smp_atomic_t run_queue;
#ifdef HIPE
    struct hipe_process_state_smp hipe_smp;
#endif
#endif
 
#ifdef CHECK_FOR_HOLES
    Eterm* last_htop;       /* No need to scan the heap below this point. */
    ErlHeapFragment* last_mbuf; /* No need to scan beyond this mbuf. */
#endif
 
#ifdef DEBUG
    Eterm* last_old_htop;   /*
                 * No need to scan the old heap below this point
                 * when looking for invalid pointers into the new heap or
                 * heap fragments.
                 */
#endif
 
#ifdef FORCE_HEAP_FRAGS
    Uint space_verified;        /* Avoid HAlloc forcing heap fragments when */
    Eterm* space_verified_from; /* we rely on available heap space (TestHeap) */
#endif
};

  

 
庄子说:"吾生也有涯,而知也无涯.以有涯随无涯,殆已!",所以各取所需就好,今天就到这里,且行且珍惜吧
 
 
[0] Routemap source tree https://github.com/erlang/otp/wiki/Routemap-source-tree
[1] A GUIDE TO THE ERLANG SOURCE  https://erlangcentral.org/wiki/index.php/A_Guide_To_The_Erlang_Source 


本文转自博客园坚强2002的博客,原文链接:

http://www.cnblogs.com/me-sa/p/erlang_source_code_guide.html如需转载请自行联系原博主。

网友评论

登录后评论
0/500
评论
唐玄奘
+ 关注