Use Cron For Rsyncing the Repository

by Loyed

I guess the first question that would come to mind is why would anyone want a local copy of the PCLOS repository? It's actually a simple reason for me. You see I have four computers in the house that are currently running PCLOS with a possible two more on the way. There are other reasons of course that one might want to copy the repositories. Such as when doing testing, or remastering. Basically any time where you might have to access multiple files (or multiple copies of the same file) from the repositories frequently.

So it occurred to me one day while I was updating the three main computers, why should I take up my time downloading from the repository a copy of each updated file that I needed for each computer? By this time I had heard of other people having local area repositories, so I decided that would be a great idea. It all starts out with a lovely little command called rsync, and of course a search through the forums to find out how others use it. My search through the forums didn't yield much, but I found the rsync command that Ikerekes uses, so I felt I had a good base at least. The original command that Ikerekes posted was:

rsync -av -P --stats --delete --exclude=SRPM* ftp.heanet.ie::pub/pclinuxos/apt/pclinuxos/2007/ /mnt/hd/texstar/pclinuxos/apt/pclinuxos/2007/

Unfortunately that was using heanet which is on the other side of the world from me, and wouldn't work well for what I had planned to do at night before bed (about the time people in that area would be waking up and going to work).

Before we go much further, let's look at what all those options on the rsync command do. First we have the "-av" which sets the rsync command to (a)rchive and (v)erbose. Archive meaning that it will update what has changed since the last copy has occurred, and verbose meaning that it will tell you everything that is going on (which is great if you want to create a log in case of an error). Next you have "-P" (yes, that's a capital P) which tells rsync to keep any partially transferred files. That way should the transfer get interrupted, it won't have to download that whole file again. Then we have the "--stats" option which will tell you how quickly a file is being transferred as well as how long it takes. My favorite part, and the next one, is the "--delete" option. This tells rsync that if a file no longer exists in the online repository, then to delete it in the local copy. That way you don't have a bunch of dead files just lying about taking up disc space. Last is the "--exclude=" option, which tells rsync to just skip over anything that the exclude is equal to. In this case SRPM*. The last two bits are of course the source and destination. The source you will have to change to fit the repository that you want to use, and the destination will be a local folder that you want it all copied to.

The difficult part is when you try to figure out how to select the source. If you compare Ikerekes source to the address in the Synaptic, then you would think that you just need to copy the address and put the double colon between the host name and the rest of the source address. Unfortunately, it's not that easy. It's actually a matter of host address followed by the double colons, then the distribution. So when I went with the Indiana university repository it was:

spout.ussg.indiana.edu::pclinuxos/pclinuxos/apt/pclinuxos/2007

whereas the actual path is:

http://spout.ussg.indiana.edu/linux/pclinuxos/pclinuxos/apt/

It took some playing around for me to finally get the right combination, but knowing how it works (or just using one of the two sources presented here) it should go faster for you.

So we have rsync set up, don't you think it would be great to have your local repository update itself every night while you sleep? Thanks to Intoit for pointing me towards cron. It's a wonderful little daemon that allows you to schedule a command to run at a given time and frequency. The part that we will use is crontab. This is the command that allows a user to view and edit their crontab file. There are two main functions that you will want to use for crontab. The first is crontab -e which allows you to edit the cron file and schedule an event to occur. To properly use this function it also helps to know how to use vi. The main commands you will need for vi (that is if you're like me and use it so rarely that you cannot remember them) are: i for insert, esc to go back to command mode, ":w" to write the file to disc, and ":q" to quit. Once you have entered crontab -e in root you will be put into vi. The only thing that will be entered in thus far is "min(0-59) hours(0-23) day(1-31) month(1-12) dow(0-7) command" commented out as an example of how to set it up. The first two, min and hours, is the time of day. The next two, day and month, allows you to set a specific day and month of a year (or day of the month). Last is dow which stands for "day of the week." If you are going to update every day, then all you will need to worry about is the min and hours section. So, if you want to schedule it to run at midnight every night , then you would enter "0 0 * * * command" Or if you would prefer to have it run every Tuesday at 3 in the morning it would be "0 3 * * 1 command" The next option you will want to use is the crontab -l. That will list any cron jobs that the user is currently set up to run. That way you can verify that your job is indeed set up to run.

Now that we have a local copy of the repository, it's time to set up things so that we can use it. First open Synaptic, then go to settings->repository. Then click new to start setting up a new repository to draw from. Under url, you will put "file:/", then the path to the folder you put the repository into. For distribution, it would be pclinuxos given the command from Ikerekes, but the command that I ended up using makes me use 2007 instead. To find out which one you have to use, just go to where you downloaded the repository and check which folder is downloaded. If the first folder is pclinuxos, then use that, if it's 2007, than use 2007. Last is the sections. You will need "main extra nonfree kde". You may add in any other section that you would like (such as testing).

If you are going to share this repository with other computers on your network, I would suggest a Network File System (NFS) share. They can actually be quite easy to set up. First, on the computer with the repository, go to the PCLinuxOS Control Center (PCC) (configure your computer) then mount points, and manage NFS shares. If you don't have NFS set up already, then it will take a while to install the necessary packages. Then you just add the folder that your repository is saved to as a share. Next you have to set it up on the client computers. Return to PCC, mount points, then set NFS mount points. Once again, if you haven't used it, you will have to wait for the packages to install. Then you have to select the server and the share that you will want to mount. From there you next set up a mount point for the share (where the share will reside within your directory system). One option you might want to keep in mind while setting it up, is user under advanced options. That way you can mount the share before you go into Synaptic to install anything.

Last, you might want to go to any computer that will be updating via this same shared repository, and set them to update every morning (sometime after the repository is done refreshing). That way you don't even have to go to Synaptic to get your updates every day. You will need to do two commands for this though. First you have to "apt-get update", to get the new file list for the repository, then "apt-get upgrade -y" to upgrade any files that are marked for upgrade. So, have fun with your very own local repository.

More rsync tips contributed by Dean Youngquist:

You could also use Kcron, a GUI program. This would avoid lessons on vi, an editor disliked by many.

--partial is the command to keep partially transferred files. -P is the same as --partial --progress. If you mean to use --partial --progress and the reader is a newbie, then keep them as separate options. Instead of -av -P you can write -avP The v is not needed if using P or --progress, so -aP is the same as -avP

I would also like to mention --bwlimit in case bandwidth must be shared with something else during the initial download. If you have all night, might as well be nice to the server and take it slowly. You can query a server in the repo list to learn if it is an rsync server. If it is an rsync server it will respond with a bunch of details.

rsync -n spout.ussg.indiana.edu::
rsync -n ftp.heanet.ie::

Might be good to also mention -n, --dry-run. It can help one avoid an accidental deletion when using --delete. It might also be wise to put more excludes in the examples. RPMS.testing and RPMS.sam are significantly large and probably will not be used. RPMS.testing changes frequently.

--exclude=RPMS.testing/
4.1k    RPMS.drivers
4.5G    RPMS.extra
1.5G    RPMS.kde
4.1G    RPMS.main
932M    RPMS.nonfree
239M    RPMS.sam
4.1k    RPMS.sam.testing
520M    RPMS.testing

List of servers known to offer PCLinuxOS repo via rsync. (Let's not mention ibiblio as that's the primary mirror). As there are several other repos it may be wise to not use ibiblio, thus keeping it free for syncing with the repos. Command line for testing the different download mirrors are included below:

rsync -avn --stats spout.ussg.indiana.edu::pclinuxos/pclinuxos/forums.xml.gz ./

rsync -avn --stats ftp.heanet.ie::pub/pclinuxos/forums.xml.gz ./

rsync -avn --stats ftp.sh.cvut.cz::pclinuxos/forums.xml.gz ./

rsync -avn --stats rm.mirror.garr.it::pclinuxos/forums.xml.gz ./

rsync -avn --stats ftp.leg.uct.ac.za::pub/linux/pclinuxos/forums.xml.gz

mypclinuxos

Top