[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: copying many files



Hi Tristan,
        I would suggest you have the wrong hardware, and as Alex
and Marcel alluded to, an improper configuration for todays larger 
disks.
        Check the block size on your disks (if not SCSI.) It should
match what you are selecting as a block size for Linux.  It makes
little to no difference to select smaller block sizes, they are all 
processed about the same. The time loading up the CPU (performing
a context swap) then processing the data (usually a disk reference 
calculation) then another unloading on the CPU is where you are 
burning speed; the actual calculation time (in integers) is small... 
2 CPU cycles usually. Parallel CPU's can often run slower because
it has to decide which CPU to run the calculation on...8 and 16meg 
disk caches can go just so far in covering this up, then they slow 
down to raw disk speeds. Smaller block sizes only allow more files
of smaller sizes in return for slightly slower performance.
        Do not use the same disk for the transfers. Get another disk
for this. A 64bit SATA or SCSI card (RAID?) will slow access to one
disk, but drastically improve your throughput...your basic problem.
Much larger cache (64meg) and often have a built-in co processor.
        Also, copying from here to there on the same disk is ALWAYS
hard on the disks. The CPU(or SCSI controller) keeps swapping 
back and forth to keep the disk going. Better with SCSI, though.
It does NOT tie up the CPU with disk reference calculations.
        If I had to do this, I would get a second disk, as my first upgrade.
        Second would be to format the disks as EXT2, not journaled.
        Third would be to reset the block size to the disk block size.
        Forth would be to increase memory (as caching.)
        Fifth would be to opt for a SCSI card (much more expensive!)
        Sixth would be to apply RAID (just plain vanilla striping...)
        Then get as many high-speed drives as possible working.
I think you are running into flooding the cache and forcing the
machine to fall back to raw disk speeds. Writing always takes more
time than reading.
        Thanks, and good luck!
                jdm





At 06:37 PM 4/20/2008, Tristan Lefebure wrote:
>Hi,
>
>These days I often have to copy several million of files from one folder to 
>another on the same computer (and usually the same disk), and it takes a 
>while with a regular 'cp' approach (several hours).
>
>The files are rather small (~400 Bites), so I think that most of the time is 
>spent creating the files, not copying the data. Would you have a suggestion 
>to speed up the process?
>
>I've already tried to create a tar archive, but it also take a while create 
>and extract the archive. Should I use another file system (I use ext3 with 
>ubuntu 7.10).
>
>Thanks for any help!
>-- 
>Tristan Lefebure
>
>Population Medicine & Diagnostic Sciences
>College of Veterinary Medicine
>Cornell University
>
>phone: (607) 253 4228
>
>http://www.people.cornell.edu/pages/tnl7/

James Marco
Computer Operations Manager
Biomedical Engineering & Chemical and Biomolecular Engineering
Cornell University
B77 Olin Hall,
Ithaca,  NY  14853
Office: 255-7312, Computer Room: 255-0480