Rob Tillotson
rob@pyrite.org
The Doc format is a simple text database format used by e-book readers on the Palm Computing Platform. Although there are a few proprietary e-book formats still in use, the Doc format is by far the most widely supported way to store long texts on the Palm platform, with over a dozen readers and editors supporting it.
In its simplest form, a Doc database is just compressed text, in a form designed for efficient storage and navigation on the palmtop. However, many newer readers support some form of "rich text" markup, either a subset of HTML or their own unique tagging schemes.
While all of these augmented readers can be an improvement over a traditional text-only reader, the proliferation of markup formats means that every one of them needs to have its own conversion program (which, often, is not available for any OS except Windows) to create Doc databases with the proper markup. It also means that the same Doc database might or might not be readable on a different reader, depending on what sort of markup is in it. Pyrite Publisher is intended to address these problems (to some extent) by providing a conversion framework with pluggable modules.
Pyrite Publisher separates the interpretation of its input from the rendering of that input into a specific markup language. During conversion, an input plugin is used to convert the input (usually text, HTML, or something similar) to a sequence of formatting events which are independent of any particular markup language; an output plugin converts those into the markup used by a particular Doc reader and creates the resulting Doc database.
Pyrite Publisher is a command-line program. A graphical user interface may be added in the future, but at present it can be run from the command line only. Normally, Pyrite Publisher's command line interface is called pyrpub, so the most basic way to use it is something like this:
% pyrpub foo.txt
Pyrite Publisher will try to convert the named file in the best way it knows how, based on the type of data in the file. For example, if you give Pyrite Publisher a HTML file as input, it will automatically use its HTML parser. The result of the conversion will be a file named foo.pdb which you can install on your handheld.
Most of Pyrite Publisher is built out of a set of plugins, which are assembled into chains to convert an input file to a Palm database. Each plugin has an input and an output; at the beginning of the chain the first plugin's input is the file you specify on the command line, and at the end of the chain the last plugin's output is the database you install on your handheld. In between, each plugin analyzes and modifies the data before passing it to the next plugin in the chain.
When you run Pyrite Publisher, it tries to construct a chain of plugins that can successfully convert the input file to a database. The plugins supplied with Pyrite Publisher are designed to be interchangeable so you can get different results simply by using different plugins.
You can choose plugins to use by using the -P command-line option. Pyrite Publisher will attempt to use the plugins you name, filling in the rest of the chain with whatever defaults are appropriate (unless the requested combination is impossible). You do not need to specify the whole chain, and order doesn't matter. For example, to use the tagged-text parser and create a zTXT instead of a regular Doc database, you could do the following:
% pyrpub -P TaggedText,zTXT foo.txt
Each plugin may have its own command-line options to control its
behavior. The help text (viewable with the -h option)
shows the available options for all plugins, and you can list all of
the installed plugins with pyrpub -l
.
In most cases, the process of converting an input file to an e-book database is divided into four parts, each handled by one plugin:
This process allows for support of multiple document formats without a lot of code duplication. A single parser plugin can format the same input for many different e-book readers, given appropriate choices for the following plugins.
Internally, Pyrite Publisher links plugins into chains by the use of named protocols. A protocol is simply a specification for how two plugins talk to one another. For example, the doc-assembler protocol describes how a parser plugin like HTML talks to a markup plugin like RichReader. Every plugin knows what input protocols it supports, and what output protocol it will use when given a particular type of input, and Pyrite Publisher uses this information to decide what chains of plugins are possible.
The only exception to this is at the beginning of the chain. The first plugin in the chain is responsible from getting input from outside of Pyrite Publisher, from a file or URL or some other location. Its ``output protocol'' is actually the MIME type of the data.
For example, if you tell Pyrite Publisher to convert the file foo.html (and you have the standard set of plugins), a typical chain would go something like this:
.pdb
or .prc
file, the URLStream plugin is
activated to read the input file.
Some protocols are implemented by multiple plugins. In the standard set of plugins, for example, there are at least three that handle text/plain and three that handle doc-assembler. This means there are many possible ways to build a chain that converts a particular file into some kind of output. Pyrite Publisher chooses which one of these possible chains to use based on a priority system that lets each plugin declare how good its handling of a particular protocol is.
For example, among the standard set of plugins plain text files can be handled by the Text, RawText, TaggedText, and HTML plugins. Of those, the most generally useful choice is Text, so it declares a higher priority for the text/plain protocol than the others.
If you name one or more plugins on the command line, Pyrite Publisher will choose a chain that includes them and use priorities to resolve any ambiguities that still exist. You can adjust the priorities of individual plugins using the configuration file, described below.
Each Pyrite Publisher plugin can have its own set of configurable
options, called properties. The command pyrpub
-list-properties
will display a list of all plugin properties.
At startup, Pyrite Publisher reads a configuration file called .pyrpub in your home directory. This configuration file can contain two types of statements: set statements which set a plugin property, and priority statements which control plugin priorities.
A set statement looks like this:
The semicolon at the end of the line is required. String values don't need to be quoted unless they contain non-alphanumeric characters. For boolean properties, the value should be either ``1'' or ``0''.
The priority statement comes in three forms:
The first form sets a plugin's priority for a specific pair of input and output protocols; if that combination is invalid there will be no effect. The second form sets the plugin's priority for a specific input protocol and all possible outputs. The third form sets the plugin's priority for all possible combinations of input and output.
The second form is probably the most useful. If you use TealDoc all the time, for example, the following statement in your .pyrpubrc file might be helpful:
priority TealDoc ``doc-assembler'' 100
The plugins described in this section are provided with Pyrite Publisher. If any of them are missing, it is likely that your installation is broken.
Unless otherwise specified, all plugin properties can be specified on the command line, by translating underscores to dashes. For example, a property called some_property will have a corresponding command line option called --some-property.
The standard plugins use the following protocols:
Protocol | Priorities | Description |
---|---|---|
RawText 0 | MIME type for arbitrary data input | |
HTML 0 | MIME type for HTML input | |
Text 0, HTML -10, TaggedText -10 | MIME type for text input | |
BasicDoc 10, RichReader 0, TealDoc 0 | Markup generator | |
DocOutput 0, TextOutput -1000, zTXT -1 | General e-book database output | |
DocOutput 0, TextOutput -1000 | Doc-format database output | |
CopyDoc 0 | Special protocol for e-book metadata passing |
The URLStream plugin is the standard input handler. It allows
Pyrite Publisher to retrieve input from local files or from remote
locations specified by http or ftp URLs. If the file or remote URL
ends in .gz
it will be un-gzipped automatically. This plugin
also determines the MIME type of the input, which in turn determines
what the next plugin in the chain will be.
The URLStream plugin has no properties or command-line options.
The PDBInput plugin handles input from files or URLs ending in
.pdb
or .prc
. Unlike URLStream it does not
automatically un-gzip. If the input is a Doc or zTXT database it will
be decompressed and treated as if it was a text file. In addition,
metadata such as the document title and bookmarks may be passed along
if the next plugin in the chain is compatible. If the input is not a
Doc or zTXT database, or if it isn't a Palm database at all, it won't
be converted at all.
The PDBInput plugin has no properties or command-line options.
The RawText plugin takes raw text input and passes it along to an assembler plugin without any additional processing whatsoever.
The RawText plugin has no properties or command-line options.
The Text plugin takes text input and re-flows it to be more readable in a handheld document reader. Generally this consists of joining paragraphs into long lines so that they will be wrapped by the reader on the handheld. In addition, this plugin can automatically add bookmarks based on a regular expression search.
The Text plugin has the following properties:
The TaggedText plugin does the same sort of paragraph reformatting as the Text plugin, but it also interprets special markup tags embedded in the text to set bookmarks, add annotations and headers, and the like. This is intended primarily to make it easier to produce e-books from plain text files, by marking interesting bits without requiring a full markup language like HTML.
The tags supported by this plugin are as follows:
By default, all tags must appear on a line by themselves, prefixed by a single period.
The TaggedText plugin has the following properties:
The HTML plugin parses HTML input and creates markup events which the next plugin in the chain can use to display it appropriately on the handheld. (If the next plugin ignores markup, the result will be to simply strip HTML tags from the input.) It can also produce footnotes showing link targets and set bookmarks at HTML headers and anchors.
The HTML plugin has the following properties:
The BasicDoc plugin creates a plain-text e-book, ignoring any markup the previous plugin asks for.
The BasicDoc plugin has the following properties:
%s
which
will be replaced by the footnote number.
The TealDoc plugin creates a Doc e-book with markup that is viewable in the TealDoc application. (The resulting document can be read in a normal Doc reader, but the markup will be visible as tags embedded in the text.)
The TealDoc plugin has the following properties:
%s
which
will be replaced by the footnote number.
The RichReader plugin creates a Doc e-book with markup that is viewable in the RichReader application. (The resulting document can be read in a normal Doc reader, but the markup will be visible as ``garbage'' characters embedded in the text.)
The RichReader plugin has the following properties:
%s
which
will be replaced by the footnote number.
The DocOutput plugin creates a Doc-format e-book database, usable by any Doc-compatible application.
The DocOutput plugin has the following properties:
REAd
''. (Command line option -C)
TEXt
''. (Command line option -T)
The zTXT plugin creates a zTXT-format e-book database. The zTXT format is currently supported by the Weasel reader, and provides annotations and better compression than the Doc format.
The zTXT plugin has the following properties:
The TextOutput plugin sends output to the console or to a text file, instead of putting it in a database, to facilitate conversion of e-books back to plain text.
The TextOutput plugin has the following properties:
The CopyDoc plugin works with the PDBInput plugin to copy a document directly to an output plugin without any parsing or other processing. It is intended to allow one document format to be converted to another; for example, the following command will convert an existing e-book to zTXT format, preserving bookmarks and annotations if possible:
% pyrpub -P CopyDoc,zTXT foo.pdb
This section will be filled in later.
This section will be filled in later.
This section will be filled in later.
This document was generated using the LaTeX2HTML translator.
LaTeX2HTML is Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds, and Copyright © 1997, 1998, Ross Moore, Mathematics Department, Macquarie University, Sydney.
The application of LaTeX2HTML to the Python documentation has been heavily tailored by Fred L. Drake, Jr. Original navigation icons were contributed by Christopher Petrilli.