11 years agoAdd scraper to grab bits of text and style from www.d.o/CD/
Steve McIntyre [Thu, 14 Apr 2011 20:44:48 +0000 (21:44 +0100)]
Add scraper to grab bits of text and style from d.o/CD/

Grab$LANG.html, split up and
search/replace some of the text to make it useful for the search CGI.

11 years agoAdd prettiness so we look more like the CD pages on www.d.o v0.3
Steve McIntyre [Thu, 14 Apr 2011 20:41:30 +0000 (21:41 +0100)]
Add prettiness so we look more like the CD pages on d.o

Add support for reading in various lumps of html that we can scrape

Reference the standard debian.css stylesheet.

Refactor generation a little.

11 years agoAdd message to say exact lookups are faster
Steve McIntyre [Tue, 5 Apr 2011 18:15:11 +0000 (19:15 +0100)]
Add message to say exact lookups are faster

11 years agoCount the results on the direct lookup too
Steve McIntyre [Tue, 5 Apr 2011 18:12:23 +0000 (19:12 +0100)]
Count the results on the direct lookup too

11 years agoIf we're given an exact search term, shortcut
Steve McIntyre [Tue, 5 Apr 2011 18:04:36 +0000 (19:04 +0100)]
If we're given an exact search term, shortcut

If we're given an exact term containing no glob characters, then we
may as well save a lot of time and just do the direct key lookup in
the database!

Simplify the two query scripts: make the actual search code
common. Might split out into a separate module later, or even combine
the scripts.

11 years agoAdd copyright headers and boilerplate
Steve McIntyre [Tue, 5 Apr 2011 17:41:23 +0000 (18:41 +0100)]
Add copyright headers and boilerplate

11 years agoSimple tools to query the CD contents database v0.2
Steve McIntyre [Mon, 4 Apr 2011 13:10:09 +0000 (14:10 +0100)]
Simple tools to query the CD contents database

Two tools:

 * is a command-line interface
 * find_file.cgi is a simple web interface.

11 years agoInitial code for generating CD contents database
Steve McIntyre [Mon, 4 Apr 2011 12:57:36 +0000 (13:57 +0100)]
Initial code for generating CD contents database

Scan all the areas defined for .list.gz files, parse the contents and
build a hash database per area ready for users to work with.