Command Your Disks With dd


by Peter Kelly (critter)

Introduction

The dd command is one of the oldest Unix commands. The name means data duplicator but is sometimes referred to as data destroyer due to its practice of doing exactly what it is told, which is not necessarily what the user intended. The dd command is very efficient but care must be taken when using it, especially when issuing the command with superuser privileges.

The command takes a stream of data as its input, optionally applies certain rules and conversions during its passage and outputs it to a nominated destination file/device. As long as everything is going fine dd performs its task quietly and efficiently outputting just a brief summary upon completion of the task.

Three buffers are used to hold the data on its way through the command, a read or input buffer, a conversion buffer used when a conversion is specified and an output or write buffer. This configuration allows for considerable flexibility in use. If the if option is not specified then STDIN is used, this can be the output from another command or even the keyboard. If of is not specified then STDOUT is used which is usually the screen but it may also be piped to another command. If all this seems a bit mystical then the examples that follow later should make this a little clearer.


Basic usage

Like so many Unix/Linux commands dd is designed to process data flowing through it in a continuous stream however the format of its options are somewhat unusual and take the form {option}={value}.

The command

dd if=myfile of=outputfile

would simply make a copy of the file myfile named outputfile. The option 'if' is used to define the input data and the option 'of' to define the output. Other options include 'bs' to define the size of blocks of data to operate on and 'count' to tell the command how many blocks of data to process. If these options are omitted then the default block size of 512 bytes is used and data is processed either continuously or until either the end of the input data is reached or an error occurs. There are many more options, as we shall see, but these are the most common ones.

The danger in using this command carelessly is this: if I wanted to make a copy of a disk to a new, empty disk then I could use a command similar to the following

dd if=/dev/sda of=/dev/sdb bs=4096

However, If I were to mix up the drive names, then instead of copying the existing data to the empty drive I would copy the empty sectors of the new drive to the older one, overwriting the data and finish with two empty drives. Caution is paramount and you may want to do a test run, omitting the output destination similar to the following example using a file named testfile that I created for the example. With no of= in the command the contents of the file are echoed to the screen.

dd if=testfile

this is a file to test the output from the dd command

0+1 records in

0+1 records out

55 bytes (55 B) copied, 0.000234282 s, 235 kB/s

If the input file is large, say a hard drive or partition, then you can limit the test output by using a smaller block size and a count of 1 block. Proceed only when you are confident.


Applications

So what can we use it for? The dd command is extremely versatile and is probably best demonstrated with some examples. The following demonstrate some of the power of the command.

Here is a simple one to start with. Create a text file. Simply pass the name of the file to create and then start typing. To close the file, type Control+D and you're done.

user@home# dd of=testfile

This file was created to demonstrate the use of the dd command.

You may enter as much text as you like or just make quick note.

0+2 records in

0+1 records out

128 bytes (128 B) copied, 119.878 s, 0.0 kB/s

user@home cat testfile

This file was created to demonstrate the use of the dd command.

You may enter as much text as you like or just make quick note.

Here the block size (bs) was not specified, so the default size of 512 bytes was used. The summary shows how many full and partial records were processed. The input records are lines and there are 2 of them, each partial (less than the 512 byte block size). Only one record was written, and this was also less than 512 bytes.

To make an exact copy of a hard drive use the command like this:

dd if=/dev/sda of=/dev/sdb bs=4096 conv=notrunc,noerror

Here we use a block size of 4KB but experimentation with this value may may provide better results, depending on your system. Two conversion options have been added. Without the first, notrunc, target drive would be truncated to the size of the input drive, which is usually undesirable. The noerror option tells dd to carry on if a read error is encountered.

Before an old drive is discarded it is a good idea to 'wipe' it clean and dd is ideal for this.

dd if=/dev/zero of=/dev/sdb1 bs=1M

dd if=/dev/urandom of=/dev/sdb1

The first method fills the drive with zeroes, overwriting any data. The second method fills it with random data but is slower. Most modern systems should be able to handle a block size of 1MB.

If we use the zeroing method then it is a simple matter to check that no readable data remains.

dd if=/dev/sdb1

As no output file is specified it goes to STDOUT, the screen. This works in a similar way to the cat command, but is more efficient.

If you want make a backup of a cd to an image file on your hard drive use

dd if=/dev/sr0 of=/home/user/mycd.iso bs=2048 conv=sync,notrunc

Change /dev/sr0 to whatever your cd drive is seen as. CDs have a block size of 2048 bytes so this is the size used here. The image can be mounted with a command such as

mount -o loop /home/user/mycd.iso /mnt/mycd

When you can't remember how to get a command to do something and the help is minimal or is really heavy going, you can sometimes get a hint by looking for human-readable text embedded in the file. The strings command can be utilised here.

user@home# dd if=/bin/ls | strings | grep -a reverse

219+1 records in

219+1 records out

112408 bytes (112 kB) copied, 0.00361762 s, 31.1 MB/s

-r, --reverse reverse order while sorting

Reverse

Need to increase your swap space? Make a swapfile with dd:

dd if=/dev/zero of=/swapspace bs=4k count=250000

mkswap /swapspace

swapon /swapspace

A RAM drive can help speed up operations that require a lot of disk access. You can easily create one or several with dd.

dd if=/dev/zero of=/dev/ram7 bs=1k count=16384

mke2fs -m0 /dev/ram7 4096

Test it:

hdparm -t /dev/ram7

/dev/ram7:

Timing buffered disk reads: 16 MB in 0.02 seconds = 913.92 MB/sec

Mount it:

mkdir /mnt/mem

mount /dev/ram7 /mnt/mem

If you want to duplicate a hard drive partition to use as a backup (in case of disaster), then we need to add a few more options

dd if=/dev/sda2 of=/dev/sdb2 bs=4096 conv=notrunc,noerror

Here, the input is the partition we want to clone, the output is where we want it to go and must obviously be at least as big as the input partition. Do not mix these two up. Next we specify a block size of 4096, which is 8 sectors of 512 bytes and is a good optimum size for this operation. Finally, we specify a couple of conversion options – notrunc and noerror. It is usually a good idea to run the fsck command on the new partition before you use it.

Convert a text file to uppercase:

dd if=filename of=new_filename conv=ucase (or lcase)

Benchmark a drive:

dd if=/dev/zero bs=1024 count=1000000 of=/home/me/1Gb.file (write)

dd if=/home/me/1Gb.file bs=64k | dd of=/dev/null (read)

Feel free to play around with bs and file size.


Watching the progress

Many of the operations that dd is used for, such as cloning a partition, are quite lengthy and dd is very secretive, showing no output. How do you know then that the process is actually running and not just locked up or stopped? There are a couple of ways to approach this.

The first way is to pass the process the USR1 signal. Usually, we pass a signal to a process to terminate or kill it, but the dd command was written to do something special when it receives this signal. It temporarily stops processing data and outputs some statistics to Standard Error, usually the screen, and then resumes its task. To make use of this, we have to catch the process identification number (PID) and run dd in the background. If we then send this signal to the PID, we shall see where dd is up to. This is easier than it sounds. Here, I am creating a 1 GB file of random data. The PID is to be found in the $! shell variable.

user@home $ dd if=/dev/urandom of=testfile bs=4k count=256000 & pid=$!

[2] 3106

[1] Exit 1 dd if=/dev/urandom of=testfile bs=4k count=256000 pid=$!

user@home $ kill -s USR1 $pid

28057+0 records in

28056+0 records out

114917376 bytes (115 MB) copied, 12.4044 s, 9.3 MB/s

user@home $ kill -s USR1 $pid

72844+0 records in

72843+0 records out

298364928 bytes (298 MB) copied, 32.2774 s, 9.2 MB/s

user@home $ kill -s USR1 $pid

155666+0 records in

155665+0 records out

637603840 bytes (638 MB) copied, 69.0612 s, 9.2 MB/s

user@home $

256000+0 records in

256000+0 records out

1048576000 bytes (1.0 GB) copied, 113.598 s, 9.2 MB/s

Another way to achieve this, which I think is a more elegant solution, is to use the pv command, which you will have to install from the repositories. The input file is piped into the dd command using the pv command. The options I have passed to pv are:

-p show a progress bar

-e show the estimated time to completion

-s 1000m tells pv the size of the file I am transferring (This is used by -e).

pv -pe -s 1000m /dev/urandom | dd of=testfile bs=4k count=256000

Partial output -

[======================> ] 31% ETA 0:01:19

Final Output -

256000+0 records in

256000+0 records out

1048576000 bytes (1.0 GB) copied, 115.727 s, 9.1 MB/s

[====================================================================>] 100%


Summary

The dd command is a very useful tool to have around, and it is worth the effort to learn at least the basics. It should however be used with a little caution when transferring valuable data. There are times when a simple copy (cp) will do the job, and other times when a command such as rsync would be more appropriate. However, the usefulness of dd cannot be denied. Unrestricted by file structures or block sizes, for raw data transfer, dd is limited only by your imagination.