README for JTE 1.5 Steve McIntyre 05 Jul 2004 JTE - Jigdo Template Export =========================== Introduction Jigdo is a useful tool to help in the distribution of large files like CD and DVD images. See Richard Atterer's site for more details. Debian CDs and DVD ISO images are published on the web in jigdo format to allow end users to download them more efficiently. Jigdo is generic and powerful - it can be used for any large files that are made up of smaller files. However, to be this generic is costly. Creating jigdo files from ISO images is quite inefficient - to work out which files are included in the ISO image, jigdo has to calculate and compare checksums of every possible file and every extent in the image. Essentially it has to brute-force the image. On my home system, it can take several hours to do this for a 4.5GB DVD image. There are a few ways to improve this that I can see: 1. Modify jigdo so it knows about the internals of ISO images and can efficiently scan them (bad, not very generic for jigdo) 2. Write a helper tool to dump extra information for jigdo to use alongside the ISO image (helper tool written, but modifying jigdo to use this looks HARD) 3. Patch mkisofs to write .jigdo and .template files alongside the ISO image I've now done the third of these, and called it JTE (or Jigdo Template Export). The code works fine, and runs in a very small fraction of the time taken to run mkisofs and jigdo separately. The output .jigdo and .template files work correctly, i.e. jigdo-file and the wrapper script jigdo-mirror accept them and will generate an ISO image that exactly matches the original. Current versions of JTE now also come with mkimage, a simple and very fast tool to reconstruct image files from .jigdo and .template files. It's not meant to compete with the normal jigdo-file program. It doesn't have any logic to cope with downloading missing files, for example. But it is much faster for people (like me!) who already have full local mirrors. There's now also a perl CGI script iso-image.pl to wrap around mkimage if you'd like to be able to offer images for HTTP download without using up multiple gigabytes of disk space. And for added network efficiency the perl CGI also supports HTTP v1.1 byte ranges so clients can resume aborted downloads. The addition of these two extra tools means that I'm now distributing JTE as a tarball rather than just a mkisofs patch; the patch is inside the tarball too. ---------------------------------------------------------------------- How to use JTE To use the jigdo creation code, specify the location of the output .jigdo and .template files alongside the ISO image. You can also specify the minimum size beneath which files will just be dropped into the binary template file data rather than listed as separate files to be found on the mirror, and exclude patterns to ignore certain files in the same way. And paths in the original filesystem can be mapped onto more global namespaces using the [Servers] section in the .jigdo file. For example: mkisofs -J -r -o /home/steve/test1.iso \ -jigdo-jigdo /home/steve/test1.jigdo \ -jigdo-template /home/steve/test1.template \ -jigdo-min-file-size 16384 \ -jigdo-ignore "README*" \ -jigdo-map Debian=/mirror/debian \ -md5-list /home/steve/md5.list \ /mirror/jigdo-test If the -jigdo-* options are not used, the normal mkisofs execution path is not affected at all. The above invocation will create 3 output files (.iso, .jigdo and .template). Multiple -jigdo-ignore and -jigdo-map options are accepted, for multiple ignore and map patterns. Use the -md5-list option to specify the location of a list of files and their md5sums in normal md5sum format. mkisofs will then compare the checksum of each file it is asked to write against the checksum of that file in the list. It will abort on any mismatches. More options have now been added in version 1.2 so that you can specify the location of boot files within the ISO image. Previously the four architectures alpha, hppa, mips and mipsel needed separate tools to make an ISO image bootable. This also made life very hard when trying to produce jigdo files. Instead, I've folded boot support for those architectures into this patch so that mkisofs will do all the work: +-----------------------------------------------------------------------------+ |Alpha | |-----------------------------------------------------------------------------| |-alpha-boot |Specify the location of the boot image (relative to the root| | |of the ISO image) | |-----------------------------------------------------------------------------| |Hppa | |-----------------------------------------------------------------------------| |-hppa-cmdline |Specify the hppa boot command line. Separate elements with | | |commas or spaces. | |----------------+------------------------------------------------------------| |-hppa-kernel-32 |Specify the location of the 32-bit boot image (relative to | | |the root of the ISO image) | |----------------+------------------------------------------------------------| |-hppa-kernel-64 |Specify the location of the 64-bit boot image (relative to | | |the root of the ISO image) | |----------------+------------------------------------------------------------| |-hppa-bootloader|Specify the location of the bootloader code (iplboot, | | |relative to the root of the ISO image) | |----------------+------------------------------------------------------------| |-hppa-ramdisk |Specify the location of the ramdisk (relative to the root of| | |the ISO image) | |-----------------------------------------------------------------------------| |Mips | |-----------------------------------------------------------------------------| |-mips-boot |Specify the location of the boot image (relative to the root| | |of the ISO image) | |-----------------------------------------------------------------------------| |Mipsel | |Mipsel is awkward because we have to parse the ELF header of the boot loader | |and write some locations from it into the boot sector. Ick! | |-----------------------------------------------------------------------------| |-mipsel-boot |Specify the location of the boot image (relative to the root| | |of the ISO image) | +-----------------------------------------------------------------------------+ ---------------------------------------------------------------------- How it works I've hooked all the places in mkisofs where it will normally write image data. All the normal data write calls (directory entries etc.) I simply pass through and build into the template file. Any file data entries are passed through with information about the original file. If that file is large enough, I grab the filename and the MD5 of the file's data so I can just write a file match record into the template file (and then the jigdo file). ---------------------------------------------------------------------- How fast is it? On my laptop (600MHz P3, slow laptop disk) I can make a template file in parallel with the ISO image from a typical 500MB data set in about 2 minutes. By simply not creating the ISO (-o /dev/null), this time halves again. The data set I'm using here is a copy of the woody i386 r2 update CD, as it's a handy image I had lying around. On my faster home server machine (1.7GHz Athlon, 512MB RAM, fast SCSI disks), I can produce a 7GB DVD iso image with the jigdo and template files in about 8 minutes. A debian-cd run from start to finish to create DVD images takes about 25 minutes per architecture. Mkisofs is normally I/O-bound on this system, but when running the jigdo creation code it's now CPU bound - it's now running 2 MD5 checksums on each data block that it sees. To boost performance when creating images, running 2 copies of debian-cd in parallel on a single system should parallelise nicely - ideally run the CD and DVD versions of each architecture together to get maximum benefit from the dentry and page cache. ---------------------------------------------------------------------- How to use mkimage mkimage is a faster, local-only version of "jigdo-file make-image", again written in portable C. It takes a few options: +-----------------------------------------------------------------------------+ |-j | | |---------+-------------------------------------------------------------------| |-t | | | | | |---------+-------------------------------------------------------------------| |-m to to find the files in the mirror | |path> | | |---------+-------------------------------------------------------------------| |-l |mkisofs | |---------+-------------------------------------------------------------------| | |Don't bother checking md5sums of the input files, or of the output | |-q |image. | | |WARNING: this may lead to corrupt images, but is much faster | |---------+-------------------------------------------------------------------| |-s |will start at the beginning (offset 0). Added for iso-image.pl use | |---------+-------------------------------------------------------------------| |-e |will run all the way to the end of the image. Added for | | |iso-image.pl use | |---------+-------------------------------------------------------------------| | |Don't attempt to reassemble the image; simply parse the image | |-z |descriptor in the template file and print the image size. Added for| | |iso-image.pl use | +-----------------------------------------------------------------------------+ Specifying a start or end offset implies -q - it's difficult to check MD5 sums if the full image is not generated! ---------------------------------------------------------------------- How to use iso-image.pl iso-image.pl is a small perl wrapper script written to drive mkimage and turn it into a CGI. It will parse the incoming request (including byte-ranges) and call mkimage to actually generate the image pieces wanted. Configuration is simple: place iso-image.pl in a cgi-bin directory and set various paths in the script: * The path to mkimage * A log file location * The path to the template and jigdo files * Other options to mkimage (e.g. -q and match locations) ---------------------------------------------------------------------- What's left to do? 1. Updates to debian-cd to use the new code in production instead of jigdo-file. Updated 2004/07/14: Done: use the jte_support branch in debian-cd CVS if you want to use this. 2. Work out why HFS hybrid images are too big. Updated 2004/07/14: This seems to be purely the space needed for the HFS information. I've checked a fix into debian-cd to fix it. 3. Testing! :-) This is where you lot come in! Please play with this some more and let me know if you have any problems, especially with data corruption. 4. More documentation. 5. Push patches upstream. This may take a while... I hope people find this useful - at the moment I shudder at the thought of releasing sarge (10+ CDs, netinst, business card, 2 DVDs per arch) without making this kind of change. It'll take a week to generate the release images otherwise...