Linux内核代码统计

用的是当前最新的4.8.7版本.
之前一直听说内核代码上千万行一辈子看不完云云,初衷应该是体现分布式协作开发的力量吧,但同时也容易打击到入门者的积极性.
实际上占比大的都是平行扩展出来的部分或可加载模块,简单统计下即可看出:

total=0
cd ${src_home}
echo "Directory\t.c\t.h\t.S\tall"

for dir in $(ls -1)
do
        if [ -d $dir ]
        then
                unset stat
                dir_total=0
                for suffix in "c" "h" "S"
                do
                        language_total=0
                        cnts=$(find $dir -name "*.$suffix" | xargs wc -l | grep -i 'total' | egrep -o '[0-9]+')
                        for cnt in $cnts
                        do
                                language_total=$((language_total+${cnt}))
                        done
                        stat="${stat}\t${language_total}"
                        dir_total=$((dir_total+${language_total}))
                done
                total=$((total+${dir_total}))
                echo "${dir}\t${stat}\t${dir_total}"
        fi
done
echo "Total\t${total}"

结果:

Directory       .c      .h      .S      all
arch            1652587 852684  398317  2903588
block           37632   1082    0       38714
certs           0       0       0       0
crypto          57622   35217   0       92839
Documentation   6838    0       533     7371
drivers         10408276        1987632 2058    12397966
firmware        0       0       2425    2425
fs              1075296 114030  0       1189326
include         0       792635  0       792635
init            3603    0       0       3603
ipc             8763    0       0       8763
kernel          251407  13751   0       265158
lib             108163  3570    0       111733
mm              118954  1119    0       120073
net             904077  39889   0       943966
samples         11827   1141    0       12968
scripts         33809   6598    0       40407
security        70227   7071    0       77298
sound           836676  132485  0       969161
tmp             0       0       0       0
tools           240549  30066   4698    275313
usr             0       0       0       0
virt            13118   500     0       13618
Total   20266925

另外用sloccount工具统计结果如下:

SLOC    Directory       SLOC-by-Language (Sorted)
9091015 drivers         ansic=9082372,perl=4459,yacc=1688,asm=1482,lex=779,
                        lisp=218,sh=17
2045118 arch            ansic=1758041,asm=282230,perl=3065,sh=1024,awk=482,
                        pascal=231,python=45
837319  fs              ansic=837319
736164  sound           ansic=735981,asm=183
675919  net             ansic=675798,awk=121
484519  include         ansic=480952,cpp=3525,asm=42
207942  tools           ansic=190227,sh=6344,perl=3977,python=3812,asm=1459,
                        yacc=1211,lex=526,awk=386
170007  kernel          ansic=170007
82387   lib             ansic=82255,perl=119,awk=13
77986   crypto          ansic=77986
76457   mm              ansic=76457
57529   scripts         ansic=29045,perl=14302,python=5439,sh=3688,cpp=2506,
                        yacc=1428,lex=1113,awk=8
52134   security        ansic=52134
25190   block           ansic=25190
10167   samples         ansic=9438,sh=729
9329    Documentation   ansic=5450,perl=1210,sh=1188,python=1123,asm=214,
                        awk=128,sed=16
9319    virt            ansic=9319
6250    ipc             ansic=6250
2699    init            ansic=2699
1877    firmware        asm=1660,ansic=217
558     usr             ansic=544,asm=14
192     certs           ansic=162,asm=30
20      tmp             sh=20
0       top_dir         (none)


Totals grouped by language (dominant language first):
ansic:     14307843 (97.60%)
asm:         287314 (1.96%)
perl:         27132 (0.19%)
sh:           13010 (0.09%)
python:       10419 (0.07%)
cpp:           6031 (0.04%)
yacc:          4327 (0.03%)
lex:           2418 (0.02%)
awk:           1138 (0.01%)
pascal:         231 (0.00%)
lisp:           218 (0.00%)
sed:             16 (0.00%)

Total Physical Source Lines of Code (SLOC)                = 14,660,097
...

工具统计结果比简单的原始统计偏小,但各目录数量级是一样的.
可以看出,代码最多的driver、fs、arch占到了全部的75%以上.
这里大部分都是平行扩展的可加载模块,代码多是因为要支持的平台、硬件和文件系统多;
要看的话只需从中选则一个典型的参考即可,如x86+ext4.
其次是sound和net,百万级;sound也是驱动性质,net是网络协议栈确实比较复杂,但也可以进一步细分.

然而最核心的kernel和mm(内存管理)分别只有20万和10万的级别,占比是非常小的,读完也完全是可以的.
所以不用怕哈,只要有心必然可以驯服这头野兽.

发表评论

电子邮件地址不会被公开。