1 README for JTE version @VERSION@
3 Steve McIntyre <steve@einval.com>
6 License - GPL v2+. See the file COPYING for more details.
8 JTE - Jigdo Template Export
9 ===========================
11 • Introduction - jigdo and JTE
15 • How to use jigit-mkimage
17 • External integration
20 ----------------------------------------------------------------------
22 Introduction - jigdo and JTE
23 ----------------------------
25 Jigdo is a useful tool to help in the distribution of large files like CD
26 and DVD images. See Richard Atterer's site [1] for more details. Debian CDs
27 and DVD ISO images are published on the web in jigdo format to allow end
28 users to download them more efficiently.
30 [1] http://atterer.org/jigdo/
32 Jigdo is generic and powerful - it can be used for any large files that
33 are made up of smaller files. However, to be this generic is costly.
34 Creating jigdo files from ISO images is quite inefficient - to work out
35 which files are included in the ISO image, jigdo has to calculate and
36 compare checksums of every possible file and every extent in the image.
37 Essentially it has to brute-force the image. It can take a long time to do
38 this for a large image (imagine a 4.5GB DVD image or a 30+GB Blu-Ray
41 I first started looking for ways to improve this back in 2004:
43 1. Modify jigdo so it knew about the internals of ISO images and could
44 efficiently scan them (bad, not very generic for jigdo)
45 2. Write a helper tool to dump extra information for jigdo to use
46 alongside the ISO image (I had a helper tool written, but modifying
47 jigdo to use this looked HARD)
48 3. Patch mkisofs/genisoimage to write .jigdo and .template files
49 alongside the ISO image
51 I completed the third of these options, and called it JTE (or Jigdo
52 Template Export). The code worked fine, and ran in a very small fraction
53 of the time taken to run genisoimage and jigdo separately. The output
54 .jigdo and .template files worked correctly, i.e. jigdo-file and the
55 wrapper script jigdo-mirror accept them and would generate an ISO image
56 that exactly matches the original.
58 Debian used that code for a number of years within genisoimage, but we've
59 since switched over to using xorriso [2] instead for our image building
60 instead. It has a lot of useful features that we want compared to
61 genisoimage, not least a friendly and engaged author in Thomas Scmitt!
63 [2] https://www.gnu.org/software/xorriso/
65 Thomas and I and George Danchev worked together to package up my JTE code
66 into libjte such that xorriso could use it effectively. Xorriso has been
67 capable of generating jigdo files since 2010.
69 In late 2019, I took over maintenance of the jigdo upstream code and added
70 support for a new (v2) jigdo data format, using SHA256 instead of MD5
71 internally. See my jigdo page for more details about that. I have also
72 updated the JTE codebase to support this new format, of course.
74 As genisoimage is effectively dead at this point, I took the decision to
75 not add the jigdo v2 support into the genisoimage codebase. If you need to
76 generate jigdo v2 format, either use jigdo itself or xorriso if you'd like
77 the performance benefit of the libjte integration.
79 JTE includes a few tools:
81 • jigit-mkimage, a simple and very fast tool to reconstruct image files
82 from .jigdo and .template files. It doesn't have any logic to cope
83 with downloading missing files, but will list the missing files that
84 are needed. It is also much faster for people (like me!) who already
85 have full local mirrors.
86 • parallel-sums is a simple extra utility to generate checksums quickly
87 and efficiently, reading file data only once and calculating checksums
88 using multiple algorithms in parallel using threads.
89 • jigsum, jigsum-sha256 and rsyncsum are checksum tools which will
90 output checksums in jigdo's base64-like format rather than the normal
91 hexadecimal format. Useful for debugging jigdo issues.
92 • jigdump is a tool to dump the contents of a jigdo template or .iso.tmp
93 file. Useful for debugging jigdo issues.
94 • mkjigsnap is a utility to help with maintaining the "snapshots" that
95 jigdo needs if you're going to be keeping data around for users in the
96 long term. We use this on some Debian systems.
98 Why the "jigit" name? The packages and source are named jigit to match the
99 name of a long-dead wrapper script. That script may be gone, but it's
100 easier to keep the name!
102 ----------------------------------------------------------------------
107 The jigit source package (and hence the various binary packages it builds) is
108 included in the main Debian archive, so your best bet is to get binary packages
109 from there. Check for the current version(s) using tracker.debian.org).
111 Source and backported versions are in the download area [3] alongside
112 the current ChangeLog. All the files for download are PGP-signed for
113 safety. You can find my keys online if you need them [4].
115 jigit is maintained in git [5].
117 [3] https://www.einval.com/~steve/software/JTE/download/
118 [4] https://www.einval.com/~steve/pgp/
119 [5] https://git.einval.com/cgi-bin/gitweb.cgi?p=jigit.git.
121 ----------------------------------------------------------------------
126 To use the jigdo creation code in xorriso, add some extra command line
127 options to control the jigdo features. You must specify the location of
128 the output .jigdo and .template files alongside the ISO image, and a
129 "checksum" list file, containing the checksums that you want JTE to match.
130 You can also specify a lot of other options to control the contents of the
131 .jigdo file. A complicated (but realistic) example from my own test setup
132 is here, with all the extra jigdo parameters explained below:
134 xorriso -as mkisofs -r -J \
135 -V 'Debian TEST amd64 n' \
136 -o debian-TEST-amd64-NETINST-1.iso \
137 -jigdo-jigdo debian-TEST-amd64-NETINST-1.jigdo \
138 -jigdo-template debian-TEST-amd64-NETINST-1.template \
139 -checksum_algorithm_iso sha256,sha512 \
140 -checksum-list /tmp/buster/checksum-check \
141 -jigdo-checksum-algorithm md5 \
142 -jigdo-force-checksum /pool/ \
143 -jigdo-min-file-size 1024 \
144 -jigdo-exclude 'README*' \
145 -jigdo-exclude /doc/ \
146 -jigdo-exclude /md5sum.txt \
147 -jigdo-exclude /.disk/ \
148 -jigdo-exclude /pics/ \
149 -jigdo-exclude 'Release*' \
150 -jigdo-exclude 'Packages*' \
151 -jigdo-exclude 'Sources*' \
152 -jigdo-exclude boot1 \
153 -jigdo-map Debian=/scratch/mirror/debian/ \
154 -joliet-long -cache-inodes \
155 -isohybrid-mbr syslinux/usr/lib/ISOLINUX/isohdpfx.bin \
156 -b isolinux/isolinux.bin \
157 -c isolinux/boot.cat \
159 -boot-info-table -no-emul-boot \
160 -eltorito-alt-boot -e boot/grub/efi.img \
161 -no-emul-boot -isohybrid-gpt-basdat \
162 -isohybrid-apm-hfsplus boot1 CD1
164 That's a long command line, but it's not too hard to follow:
166 • -jigdo-jigdo specifies the output filename for the .jigdo file
167 • -jigdo-template specifies the output filename for the .template file
168 • -checksum_algorithm_iso specifies which checksums to include for the
169 ISO image inside the .jigdo file
170 • -checksum-list specifies the input filename for the checksum data that
172 • -jigdo-checksum-algorithm specifies which checksum algorithm to use
173 inside the jigdo file, both for describing the files in the ISO and
174 the ISO itself. The allowed options are "md5" (i.e. jigdo format v1,
175 the default), or "sha256" (i.e. jigdo format v2).
176 • -jigdo-force-checksumadds a match pattern for the jigdo generation
177 code - all files matching this pattern must be listed in the
178 checksum-list, and they mush have the correct checksum. This is used
179 in Debian as a precaution that the source data is correct for all the
180 packages we're including on our media.
181 • -jigdo-min-file-size and -jigdo-exclude are two different ways to stop
182 certain files from being matched in the jigdo generation code. We
183 don't want to waste time on files that are too small, or too temporary
184 (e.g. generated during the CD build process itself), or that are not
185 tracked cleanly with versions inside the Debian archive. Files
186 excluded from the jigdo generation using these parameters will
187 therefore be included directly in the .template raw data section.
188 • Finally, one or more -jigdo-map entries should be added, to map
189 pathnames in the .jigdo file to the [Servers] section.
191 If the -jigdo-* options are not used, the normal xorriso execution path is
192 not affected at all. The above invocation will create 3 output files
193 (.iso, .jigdo and .template). Multiple -jigdo-exclude and -jigdo-map
194 options are accepted, for multiple exclude and map patterns.
196 ----------------------------------------------------------------------
201 Internally in libisoburn (and hence xorriso), in all the places where it
202 will write image data it will also call into libjte to offer that image
203 data for jigdo processing. Any file data entries are passed through with
204 information about the original file. If that file is not excluded (because
205 of its path or size, as mentioned), JTE will grab the filename, the size
206 of the file and the checksum of the file's data. If that checksum, size
207 and length match an entry in the input checksum-list, JTE will write a
208 file match record into the template file (and then the jigdo file) instead
209 of the file data itself. For anything else (excluded files, directory
210 data, etc.), raw data is simply copied through and compressed into the
213 ----------------------------------------------------------------------
215 How to use jigit-mkimage
216 ------------------------
218 jigit-mkimage is a faster, more minimal version of "jigdo-file
219 make-image", written in portable C. It takes a few options:
221 ┌─────────┬──────────────────────────────────────────────────────────────┐
222 │-f <MD5 │Specify a file containing MD5sums for files we should attempt │
223 │file> │to use when rebuilding the image │
224 ├─────────┼──────────────────────────────────────────────────────────────┤
225 │-F │Specify a file containing SHA256sums for files we should │
226 │<SHA256 │attempt to use when rebuilding the image │
228 ├─────────┼──────────────────────────────────────────────────────────────┤
229 │-j <jigdo│Specify the input jigdo file │
231 ├─────────┼──────────────────────────────────────────────────────────────┤
233 │<template│Specify the input template file │
235 ├─────────┼──────────────────────────────────────────────────────────────┤
236 │-m <item=│Map <item> to <path> to find the files in the mirror │
238 ├─────────┼──────────────────────────────────────────────────────────────┤
239 │-M │Don't attempt to build the image; just verify that all the │
240 │<Missing │components needed are available. If some are missing, list │
241 │file> │them in the specified file. │
242 ├─────────┼──────────────────────────────────────────────────────────────┤
243 │-v │Make the output logging more verbose. │
244 ├─────────┼──────────────────────────────────────────────────────────────┤
245 │-l <log │Specify a logfile. If not specified, will log to stderr just │
246 │file> │like genisoimage │
247 ├─────────┼──────────────────────────────────────────────────────────────┤
248 │ │Don't bother checking checksums of the input files, or of the │
250 │ │WARNING: this may lead to corrupt images, but is faster as │
251 │ │less work is done. │
252 ├─────────┼──────────────────────────────────────────────────────────────┤
253 │-s <start│Specify where to start in the image (in bytes). If not │
254 │offset> │specified, will start at the beginning (offset 0). Added for │
255 │ │iso-image.pl use │
256 ├─────────┼──────────────────────────────────────────────────────────────┤
257 │-e <end │Specify where to end in the image (in bytes). If not │
258 │offset> │specified, will run all the way to the end of the image. Added│
259 │ │for iso-image.pl use │
260 ├─────────┼──────────────────────────────────────────────────────────────┤
261 │ │Don't attempt to reassemble the image; simply parse the image │
262 │-z │descriptor in the template file and print the image size. │
263 │ │Added for iso-image.pl use │
264 └─────────┴──────────────────────────────────────────────────────────────┘
266 Specifying a start or end offset implies -q - it's difficult to check
267 checksums if the full image is not generated!
269 ----------------------------------------------------------------------
274 I had extra plans for JTE that never really came to fruition due to a lack
275 of time and energy... :-/ Check git history if you're interested.
277 iso-image.pl - on-the-fly rebuild of ISO images for HTTP
279 iso-image.pl was a small perl wrapper script written to drive mkimage and
280 turn it into a CGI. It would parse the incoming request (including
281 byte-ranges) and call jigit-mkimage to actually generate the image pieces
284 This code worked, but was always too slow for production use. Each CGI
285 request needed to index into the ISO image independently, leading to lots
286 and lots of overlapping calls to decompress the template data.
288 jigdoofus - a better way to do on-the-fly assembly
290 I started on a new project, creating a FUSE-based filesystem that would
291 rebuild ISOs on the fly. I decided to use a database backend and a caching
292 system to solve the problem of the repetitive decompression that stopped
293 iso-image.pl. I made some progress, but ran out of steam. Code is still in
294 the "jigdoofus" branch in git in case anybody ever finds it useful.
296 jigit - a friendly wrapper for jigit-mkimage
298 Similarly to the jigdo-lite script in the jigdo package, I wanted to
299 provide a nicer user experience for easy downloading of Debian and Ubuntu
300 CD images. It worked, but never really gained much traction. It needed
301 much more effort to make things reliable for production use.
303 ----------------------------------------------------------------------
310 The debian-cd package in Debian is what we use to generate installer CDs
311 and DVDs. It has supported JTE since 2005, and we still use it every day.
315 genisoimage in Debian shipped with integrated JTE code for a long time,
316 but is basically dead upstream. Not recommended for use any more.
320 xorriso uses libjte to generate jigdo and template files, and has worked
323 ----------------------------------------------------------------------
328 1. Testing! :-) This is where you lot come in! Please play with this some
329 more and let me know if you have any problems, especially with data
331 2. More documentation.