BitTorrent-(1)文件结构

一切都要从一个蝙蝠说起，在家无聊下电影时想到研究下bt的文件结构。

bencode

BT 的种子文件进行了bencode 编码。bencode有4种数据类型:string, integer, list 和 dictionary。

string类型

string类型格式[length]:[string]，例子：

1	"abcd" → 4:abcd

int类型

int类型格式为i[int]e，例子：

1	12345 → i12345e

List类型

List类型(List<object>类型)格式为l[object]e，例子：

1 2	List<"abc","ghe"> → l3:abc3:ghee List<"abc",123> → l3:abci123ee

Dictionary类型

Dictionary(Dictionary<string, object>)类型格式为d[KeyValue]e，例子：

1	Dictionary<{"name":"bob"},{"age":20}> → d4:name3:bob3:agei20ee

Torrent结构

解码后分析文件结构信息。在单文件和多文件时，结构略有不同，不过前面部分字段是相同的，不同在info部分，以下说明了主要字段。

多文件情况下：

Torrent
├─ announce         Tracker主服务器
├─ announce-list    Tracker主服务器列表
├─ comment          注释
├─ creation date    创建的时间戳
├─ created by       创建者
├─ encoding         默认编码    
├─ info             下载信息相关
|  ├─ files
|  |  ├─ length
|  |  ├─ path
|  |  └─ path.utf-8
|  ├─ name
|  ├─ name.utf-8
|  ├─ piece length
|  ├─ pieces
|  ├─ publisher
|  ├─ publisher-url
|  ├─ publisher-url.utf-8
|  └─ publisher.utf-8
└─ nodes

单文件时：

Torrent
├─announce
├─announce-list
├─comment
├─comment.utf-8
├─creation date
├─encoding
├─info
│ ├─length
│ ├─name
│ ├─name.utf-8
│ ├─piece length
│ ├─pieces
│ ├─publisher
│ ├─publisher-url
│ ├─publisher-url.utf-8
│ └─publisher.utf-8
└─nodes

Info

files

多文件种子的files表示了文件的名字、大小。其包含三个子段：

length：文件的大小，byte为单位计算

path：文件的名字，不可改

path.utf-8 文件名的utf-8编码

这三个字段每个文件都有一组值。

其他字段

name: 推荐文件夹名，下载时可改
name.utf-8: 上述的utf-8编码
piece length: 每个文件块的大小，Byte计算
pieces: 文件特征信息，其实是文件按照上面大小分段后得到每段的sha1值，然后进行拼接。由于SHA1的校验值为20Byte，所以该字段的大小始终为20的整数倍字节。这说明了文件分段越多torrent文件体积越大。
publisher: 文件发布者名字
publisher-url: 文件发布者的网址
publisher.utf-8 && publisher-url.utf-8: utf-8编码的信息

nodes

这个字段包含一系列ip和相应端口的列表，是用于连接DHT初始node。

关于BT协议的下次再看…