2. Using pcp

pcp can be used to write, produce checksums, or delete files replicated on a set of nodes.

2.1. Setting your PATH


   # export PATH=$PATH:/usr/local/pcp/bin

2.2. Writing a file to multiple nodes

The following command copies local file foo.txt and copies it on nodes tgl0, tgl1, tgl2, and tgl3 as /tmp/foo.txt. The -v option produces verbose output. Writing is the default option so no switch is needed. Output is shown below.


   # pcp -v foo.txt /tmp/foo.txt tgl0 tgl1 tgl2 tgl3

   ##################################################
   Write succeeded on tgl0.cacr.caltech.edu (131.215.145.40)
   Write succeeded on tgl1.cacr.caltech.edu (131.215.145.41)
   Write succeeded on tgl2.cacr.caltech.edu (131.215.145.42)
   Write succeeded on tgl3.cacr.caltech.edu (131.215.145.43)

2.3. Checksumming a file on multiple nodes

The following command produces the SHA-1 hash of the remote file /tmp/foo.txt on tgl0, tgl1, tgl2, and tgl3. Output is shown below.


   # pcp -c /tmp/foo.txt tgl0 tgl1 tgl2 tgl3

   Checksum succeeded on tgl0.cacr.caltech.edu (131.215.145.40)
      SHA-1 = 1ccf4925fc3b8767986303a3b16c6c8dfaf7ee13
   Checksum succeeded on tgl1.cacr.caltech.edu (131.215.145.41)
      SHA-1 = 1ccf4925fc3b8767986303a3b16c6c8dfaf7ee13
   Checksum succeeded on tgl2.cacr.caltech.edu (131.215.145.42)
      SHA-1 = 1ccf4925fc3b8767986303a3b16c6c8dfaf7ee13
   Checksum succeeded on tgl3.cacr.caltech.edu (131.215.145.43)
      SHA-1 = 1ccf4925fc3b8767986303a3b16c6c8dfaf7ee13

2.4. Deleting a file on multiple nodes

The following command deletes the remote file /tmp/foo.txt on tgl0, tgl1, tgl2, and tgl3. Output is shown below.


   # pcp -d /tmp/foo.txt tgl0 tgl1 tgl2 tgl3

   Delete succeeded on tgl0.cacr.caltech.edu (131.215.145.40)
   Delete succeeded on tgl1.cacr.caltech.edu (131.215.145.41)
   Delete succeeded on tgl2.cacr.caltech.edu (131.215.145.42)
   Delete succeeded on tgl3.cacr.caltech.edu (131.215.145.43)

2.5. Optimizing the Distribution Tree

pcp uses an nary tree and parallelized, pipelined data transfers for file distribution. However, given varying network, disk, and CPU speeds, the default parameters used to build this tree may not be optimal for all systems. To avoid locking users in with these suboptimal trees, users can set their own tree parameters explicitly using a configuration file $HOME/.pcprc.


   # cat ~/.pcprc
   fanout          4
   frag_size   32768

The above example specifies that the tree should be a 4-ary tree and that data transfers should be fragmented into 32 KB chunks. Choosing the fragment size is important as it represents a trade-off between extra per-fragment processing costs and larger store-and-forward delays per fragment.