Proficient in IPFS: IPFS Get content above

In the previous articles, we analyzed the process of saving files. We know that if a file is not accessed by anyone, it is stored locally, and the file can be saved on the IPFS network after at least one access. Today, let's take a look at how to download files that have been saved on the IPFS network to the local area. Let's take a picture of the famous comet man as an example to analyze how to download files. The hash of this picture of the QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ is QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ , and the sample code for downloading it is as follows:

 const {createNode} = require('ipfs') const fs = require('fs'); 

Const node = createNode()

Node.on('ready', async () => {
Const file = await node.get('QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ/cat.jpg')

fs.writeFile('cat.jpg',file[0].content,(err) => {

If (err) throw err;
Console.log('cat saved!');
});
})
Run this code, you will generate the comet in the current directory, let us first enjoy the beautiful comet!

After watching the comet, let's see how the code is executed. The body of the above code is the get method, which is located in the core/components/files-regular/get.js file of IPFS. Its contents are as follows:

 (ipfsPath, options, callback) => {    if (typeof options === 'function') {      callback = options      options = {}    } 

Options = options || {}

Pull( self.getPullStream(ipfsPath, options), pull.asyncMap((file, cb) => { if (file.content) { pull( file.content, pull.collect((err, buffers) => { if ( Err) { return cb(err) } file.content = Buffer.concat(buffers) cb(null, file) }) ) } else { cb(null, file) } }), pull.collect(callback) ) } This anonymous function receives the image path we passed to it. It uses the pull function of the pull-stream class library to call the getPullStream method of the IPFS object to get the file. After the file is obtained, the pull.asyncMap stream is called to pull.asyncMap the obtained file. Finally, the final file is passed to the pull.collect stream for processing. The latter directly calls the callback function we provide to hand the final file to the user for processing.

Through the above simple analysis, we can find that the file obtained from the IPFS network is mainly through the getPullStream method of the IPFS object, and this method is registered as one of the IPFS objects in the core/index.js file during the process of creating the IPFS object. The main content of the method is to return a stream of a pull-stream class library. The code is as follows:

 pull(  exporter(ipfsPath, self._ipld, options),  pull.map(file => {    file.hash = file.cid.toString()    delete file.cid    return file  }) ) 

The above code is defined in the core/components/files-regular/get-pull-stream.js file. The main body is also the pull function of the pull-stream class library. The first parameter is the internal return of the ipfs-unixfs-exporter class library function. Stream, this internal stream will get the file we want from the IPFS network, the second parameter is the map stream of the pull-stream class library, which generates a hash of the file according to the CID of the obtained file object.

Next, let's look at the exporter function, whose code is in the index.js file of the ipfs-unixfs-exporter class library. Its execution logic is as follows:

  1. Processes the passed path and returns an object. The returned object contains the base path and the intermediate path.
     let dPath try {    dPath = pathBaseAndRest(path) } catch (err) {    return error(err) } 

    Here we pass in the path is QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ/cat.jpg , so the basic path of the QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ object is QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ , the array of the intermediate path is empty.

  2. Get the path length without the final name
     const pathLengthToCut = join([dPath.base].concat(dPath.rest.slice(0, dPath.rest.length - 1))).length 
  3. Generate a CID object based on the base path.
     const cid = new CID(dPath.base) 

    The CID object consists of four parts: multibase, version number, multicodec, multihash. When we pass QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ to the CID constructor, the function internally sets the version number to 0, multicodec to dag-pb , and calls the multiple hash function to convert the Base58 string to multihash.

  4. Finally, use the pull function to return the stream used in the pull-stream class library.
     return pull(    values([{      cid,      name: dPath.base,      path: dPath.base,      pathRest: dPath.rest,      depth: 0    }]),    createResolver(dag, options),    filter(Boolean),    map((node) => {      return {        depth: node.depth,        name: node.name,        path: options.fullPath ? node.path : finalPathFor(node),        size: node.size,        cid: node.cid,        content: node.content,        type: node.type      }    }) ) 

    This returned stream is called by the pull function in the get-pull-stream.js file outside to get-pull-stream.js specified file from the IPFS network.

The above code is briefly explained as follows:

  • values function is the source stream defined in the pull-stream class library. It creates a source stream that reads values ​​from an array or object and then terminates. Here we generate an object with the passed path and its generated CID, and generate an array. To be the content of the stream.
  • createResolver function is the createResolver function in the resolve.js file in the current directory. It receives the dag object and returns a pull-stream stream object, where the dag object is the _ipld object in the _ipld object. The stream object returned here reads the file to be fetched from the previous stream, and then calls the get method of the dag object to get the specified file. The specific processing is analyzed below.
  • map function is a through stream defined in the pull-stream class library that converts each element in the array using a user-specified conversion function. The handler here is relatively simple, generating and returning another object based on the object returned by the previous stream. The object returned here is the object we finally see in the sample program, except that it has no hash attribute and cid is removed.

Below, we focus on the createResolver function, which is defined in the resolve.js file of the ipfs-unixfs-exporter class library. Its main body is to generate and return a pull-stream class library using the pull function. The code is as follows:

 pull(    paramap((item, cb) => {      if ((typeof item.depth) !== 'number') {        return error(new Error('no depth'))      } 

If (item.object) { return cb(null, resolveItem(null, item.object, item, options)) }

Waterfall([ function (done) { dag.get(item.cid, done) }, function (node, done) { // node is the result of deserialization of the block object, which may be the total Dag of the file, or For a file (in the case of no chunking). done(null, resolveItem(item.cid, node.value, item, options)) } ], cb) }), flatten(), filter(Boolean), Filter((node) => node.depth <= options.maxDepth) ) The above code is briefly explained as follows:

  1. First, call the paramap function, return the pull-paramap stream, use the waterfall method of the asynchronous class library in this stream, call the get method of the IPLD object in turn, and get the block object from the local or other nodes; after getting the block object, call resolveItem method, the resulting block object (the block object obtained here, may be a complete file, may be a fragment of the file, may also be a directory, etc.).

    The pull-paramap stream is a pull-stream stream that takes three arguments, the first argument type is a function, the function signature is (data, cb) , and the user-defined business logic is executed in the function, the second and the The three parameters are all optional.

Pull-paramap reads data from the previous stream in parallel, calls the first parameter to specify the function to process, and returns the result of the function call as an array to the subsequent stream. The order of the results in the array is consistent with the order of the source data provided by the previous stream. The asynchronous processing function of the paramap function here is as follows:

  • Checks if the current object's depth property is not a number. If it is not a number, it returns an error. The current object here is the object generated and returned in the previous values stream.
  • If the object property of the current object exists, call resolveItem interpret the current object and pass the result to the next function. Our object does not have an object property, so the code here will not execute.
  • Call the waterfall method of the asynchronous class library, call the get method of the IPLD object in turn, and get the block object from the local or other nodes; after getting the block object, call the resolveItem method to process the obtained block object.
  • Then, call the flatten and filter pull-stream class library into the returned block object for processing.
  • Finally, the filter stream of the pull-stream class library is called to filter over the specified depth block object.
  • From the above explanation, we can find that the most important business logic is to obtain the block object and process the obtained block object. Below we analyze these two aspects in depth.

    1, get the block object

    In the above code, we get the block object by calling the get method of the dag object. Its execution logic is as follows:

    The type of the dag object is the IPLDResolver object in the ipld class library, which is generated and set on the IPFS object when the IPFS object is initialized.

    1. If the path parameter type is a function, reset the parameter. According to the above call, our path parameter here is the internal function done provided by the waterfall , so the following code will be executed to reset the following two variables.
       if (typeof path === 'function') {  callback = path  path = undefined } 
    2. If the option argument is a function, reset the argument. According to the above call, there is no option parameter, so the following code will not be executed.
       if (typeof options === 'function') {  callback = options  options = {} } 
    3. Processing path parameters
       if (typeof path === 'string') {  path = joinPath('/', path)    .substr(1)    .split(osPathSep)    .join('/') } 
    4. If the path argument is an empty string or is undefined, the internal function _get is called for processing and its result is returned in its asynchronous callback function. The internal function _get internally handled by the waterfall function. The specific code is as follows:
       waterfall([  (cb) => this._getFormat(cid.codec, cb),  (format, cb) => this.bs.get(cid, (err, block) => {    if (err) return cb(err)    cb(null, format, block)  }),  (format, block, cb) => {    format.util.deserialize(block.data, (err, deserialized) => {      if (err) {        return cb(err)      }      cb(null, deserialized)    })  } ], callback) 

      Inside the waterfall function, first call the _getFormat method to get the formatted object for the CID object; then call the get method of the block service object to get the block object; finally, use the inverse of the tool object of the format object The serialization method, deserializes the block data obtained by the block service object.

      The block service object is located in the index.js file of the ipfs-block-service class library. Its get method determines whether to get the block object from the bitswap object or the local warehouse according to whether there is a bitswap object. Its code is as follows:

       get (cid, callback) {    if (this.hasExchange()) {      this._bitswap.get(cid, callback)    } else {      this._repo.blocks.get(cid, callback)    } } 

      When the system is started, the bitswap object is available when processing the help file in the init-docs directory. That is, only this process will save/get the block object directly from the local repository. In other cases, the getwap object is called. Method to get the block object.

      The get method of the bitswap object delegates its own getMany method for processing. The latter process is as follows:

      • Initialize the variables used internally: the wantList array is empty, the promptedNetwork is false, and the pendingStart is the number of all CIDs requested.
      • Generate a function object getFromOutside gets the block object from other nodes.
      • Call the map function of the asynchronous class library, traversing each block object to be requested. For each block to be requested, use the waterfall function of the asynchronous class library for processing. waterfall function is handled as follows:
        • Calling the has method of the block storage object to check if there is a requested block locally;
        • If there is a requested block locally, then: If all the requested CIDs have been processed, then the WantManager's wantBlocks method is called to get the required block; the get method of the calling block storage object loads the block locally and returns.
        • If the internal variable promptedNetwork is false, then: set this variable to true (ensure that only one request can be processed immediately); call the findAndConnect method of the network object to find the CID of the first request.
        • Call the function object getFromOutside for processing. This function puts the specified CID into the wantList array, then calls the wantBlock object's wantBlock method to notify the system that we want the block. If all the requested CIDs have been processed, then call WantManager's wantBlocks method to get the required block.

      notifications object is an internal module that tracks received blocks, desired blocks, unwanted blocks, and more. The first parameter of this function is the requested block CID, and the second parameter is a function to cancel the block from the desired list after receiving a block, to avoid requesting another node again. Three parameters are used to cancel a request for a block.

    5. When the path parameter is not an empty string and there is a value, the doUntil function of the asynchronous class library is doUntil for processing. The business processing inside the doUntil function is basically similar to the above, and the reader can analyze it by himself.

    2, parsing the block object

    After calling the get method of the dag object to get the block object, is the processing flow already finished? This is not the case. We can imagine that the Unix file system is a tree structure, starting from the root directory / then the subdirectory, which can be a grandchild directory or file under the subdirectory, or a grandchild directory under the grand directory or Documents, children and grandchildren are endless. In addition to the directory, when the file we want to get is large, it will be split into a structure similar to the directory tree structure when uploading. The top level is the total DAGNode object of the file. It connects to the child fragment through DAGLink, and the sub-fragment You can connect to Sun Shards through DAGLink, and Sun Shards can be connected to the grandchildren through DAGLink, and the children and grandchildren are endless. So after the block object is obtained, the waterfall function of the asynchronous class library calls the resolveItem function to parse the obtained block object by calling its second parameter, and obtains the complete block object according to the obtained block.

    resolveItem function is implemented as a direct delegate to another function resolve , the code is as follows:

     function resolveItem (cid, node, item, options) {    return resolve({      cid,      node,      name: item.name,      path: item.path,      pathRest: item.pathRest,      dag,      parentNode: item.parent || parent,      depth: item.depth,      options    }) } 

    resolve function processing is as follows:

    1. The function typeOf called to detect the true type of the block object.
       try {  type = typeOf(node) } catch (err) {  return error(err) } 

      Block objects may be directory, hamt-sharded-directory, file, object, raw, and each type has a specific processor for parsing.

    2. Get the parser of the corresponding type from the resolvers object
       const nodeResolver = resolvers[type] 

      resolvers object is defined at the beginning of the file and looks like this:

       const resolvers = {  directory: require('./dir-flat'),  'hamt-sharded-directory': require('./dir-hamt-sharded'),  file: require('./file'),  object: require('./object'),  raw: require('./raw') } 
    3. Call resolveDeep create resolveDeep .
    4. Call the nodeResolver function to parse the specified block object. The nodeResolver function is different for different types. When the obtained block object type is a directory, the function is the dirExporter function defined in dir-flat.js ; when the obtained block object type is a file, the function is a function defined in the file.js file; when the obtained area is obtained The function defined in the function object.js file when the block object type is an object. When we get the comet, there are two types, catalogs and files, and we will analyze them in both types. dirExporter function is executed as follows:
      • Set the first object to get
         const accepts = pathRest[0] 

        pathRest is the array of paths generated by the path we provide when we call the get method. The array does not include the base part of the path. For our example, the array has only one element, cat.jpg .

      • Generate a variable that represents the current directory.
         const dir = {    name: name,    depth: depth,    path: path,    cid,    size: 0,    type: 'dir' } 
      • Returns the source stream values pull-stream class library if the depth of the currently fetched object exceeds the maximum depth specified by the option.
         if (options.maxDepth && options.maxDepth <= depth) {    return values([dir]) } 
      • Generate an array of streams.
         const streams = [    pull(      values(node.links),      filter((item) => accepts === undefined || item.name === accepts),      map((link) => ({        depth: depth + 1,        size: 0,        name: link.name,        path: path + '/' + link.name,        cid: link.cid,        linkName: link.name,        pathRest: pathRest.slice(1),        type: 'dir'      })),      resolve    ) ] 

        The stream array generated above will be cascaded at the end of the function through the pull-cat class library. The final execution of the above code is described as follows: traverse all connections in the current directory; unless the connection does not belong to the current request; The connection generates the corresponding object; finally, the generated object is passed to the resolve stream, that is, the stream returned by the createResolver function is called. This stream has been analyzed before, and will not be elaborated here.

    In our example of QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ/cat.jpg , we specify the path QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ/cat.jpg , / the path in front is the base path, which represents a directory, there is a file in this directory, this file is the comet, that is node.links points to the CID of the node.links . Through the call of the pull-cat class library, in the stream returned by the createResolver function, we will actually request the comet, the processing of the specific file, we analyze below.

  • If there is no other path in the path other than the base path, or if the option is specified as a full path, use the dir variable to generate a values stream and place it at the top of the stream array streams .
  • Call the pull-cat class library for stream processing. The upper body processing process is described above.
  • The above is the process of IPFS processing the directory.

    Let's look at how IPFS is specific. As we mentioned earlier, each file may be fragmented when it is saved to an IPFS network. That is, the large files are divided into small pieces, each of which has its own The hash, based on the hash of the fragment, generates the corresponding DAGLink, in the order in which the fragments appear in the file, using these DAGLinks to generate a join array, and using the join array to generate the final top-level DAGNode object to represent the file. Our comet is also divided into two pieces. In the previous analysis, after requesting the directory, through the call of the pull-cat class library, and requesting the flow returned by the createResolver function again, we will request the total DAGNode of the comet. Object, when the nodeResolver function is called, this time selects the file.js file for request processing. Its execution process is as follows:

    • Set the first object to get
       const accepts = pathRest[0] 

    If (accepts !== undefined && accepts !== path) { return empty() } This time the pathRest array is empty, so here the accepts are undefined.

  • Call the UnixFS static method unmarshal method to unmarshal the Uninx file object from the data property of the block object.
     try {    file = UnixFS.unmarshal(node.data) } catch (err) {    return error(err) } 
  • Get file size, specified length, and offset
     const fileSize = file.fileSize() 
  • Let offset = options.offset let length = options.length

    If (offset < 0) { return error(new Error('Offset must be greater than or equal to 0')) }

    If (offset > fileSize) { return error(new Error('Offset must be less than the file size')) }

    If (length < 0) { return error(new Error('Length must be greater than or equal to 0')) }

  • If the length is 0, then the once stream of the pull-stream class library is generated and returned.
     if (length === 0) {    return once({      depth: depth,      content: once(Buffer.alloc(0)),      name: name,      path: path,      cid,      size: fileSize,      type: 'file'    }) } 
  • Recalculate the offset and file length.
     if (!offset) {    offset = 0 } 
  • If (!length || (offset + length > fileSize)) { length = fileSize – offset }

  • Call the streamBytes function to get the specified content based on the offset, length, and the connected array of nodes. streamBytes function uses the depth-first algorithm to get all the fragmented data of the block object. The result is a through stream of a pull-stream class library. code show as below:
     if (offset === fileSize || length === 0) {    return once(Buffer.alloc(0)) } 
  • Const end = offset + length

    Return pull( traverse.depthFirst({ node, start: 0, end: fileSize }, getChildren(dag, offset, end)), map(extractData(offset, end)), filter(Boolean) ) The pull-travers class library provides depth-first, breadth-first, and leaf-first algorithms to traverse a tree. Here we use depth-first to traverse all the pieces of the file.

  • Generates and returns the values stream of the pull-stream class library. The returned stream is in turn used in the stream returned by the createResolver function of resolver.js , which is used by the map stream in the pull function in the ipfs-unixfs-exporter class library; in the ipfs-unixfs-exporter class library The map stream in the pull function is used by the get-pull-stream.js in the get-pull-stream.js file, and is finally converted to a Buffer object by the handler of the pull.asyncMap stream in the get.js file, so that our program The contents of the file are read from the Buffer object.
  • Click to view all articles in "Mastering IPFS Series"