My Blog

Publishing techniques with XML 2018

Overview

XML is widely used in publishing workflows, for both print and electronic media.
This course shows you ways to manage the workflow, the interaction between
content and people, and the publishing processes, as well as ways to structure
the documents themselves.

We will explore useful XML techniques, and discuss the use of some standard
schemas including DocBook and TEI for both traditional publishing and digital
humanities. Whether you work for a large publisher, academia, or a small
organization, you will find learning more about publishing techniques that
incorporate the features of XML ensures that your valuable information and its
structure can be controlled and managed.

This course is chaired by Peter Flynn and taught by Norm Walsh, Nic Gibson, Tony Graham, and Tomos Hillman.

Classes for 2018

The Publishing Techniques with XML course runs on
and
.

Introduction to XML in publishing

Taught by Tomos Hillman.

This session addresses the impact of technology on publishing, exploring trends
of abstraction, separation of concerns, and profitability. We go on to discuss
the strengths and weaknesses of XML in publishing, and explore what this should
mean for how we plan our content and workflows.

Capturing XML Content

Taught by Tomos Hillman.

Starting from the general principles established in the introduction, this
session compares approaches on capturing the XML content. We’ll take some time
to look at quality control, discussing the benefits of technologies like schema
and schematron, as well as considering documentation needs. As well as
discussing challenges working with external type-setters and capturers, we’ll
look at some of the possibilities and pit-falls of authoring directly in XML.

Underlying Technologies

Taught by Norm Walsh.

In this section of the course, we’ll turn our attention to the technology choices
available: schema languages, validation technologies, and processing tools.
We’ll consider vocabulary concepts: What makes a good schema? Should you build
your own or use an existing standard? How do JATS, DocBook, DITA, etc. compare?
How can you tell what’s right for your organization? What processing tools are
available and how can you leverage them? Should your workflow include Markdown
or other non-XML structured markup langauges? How can you leverage linked data
in your publishing workflow? We’ll leave time for questions and discussion of
the particular challenges facing our delegates.

This session starts after lunch and continues after the break.

Introduction to CSS for Paged Media

Taught by Tony Graham.

CSS can be used for making pages as well as for styling websites. Many people are
familiar with CSS in the browser: some are very familiar, but others, not so
much. Fewer people, however, are as familiar with using CSS for paged media.

This session takes an eat-your-own-dog-food approach to showing how to use CSS
for paged media. Starting with the HTML text of a tutorial on using CSS with
paged media, the session will progressively add and explain the CSS styles that
are used to format the finished text. By the end of the session, the fairly
undifferentiated mass of text will be formatted with running headers and
footers, page numbers and page number cross-references, bleeds, CMYK colours,
footnotes, top- and bottom-floats, and other features that appear in paged media
but not in a browser. The session will also describe accessibility features of
PDF.

Because of the comparatively short duration of the session, there is only time to
cover the CSS features specific to paged media.

Making ‘Pages’

Taught by Tony Graham.

Anyone working with XML and text documents is eventually going to have to format
their documents for people to read; XSL-FO and CSS are among the obvious means.
This part gives a review of some of the options available to you for producing
traditional paged media or book-like output, and how it can be managed alongside
non-paged editions of the documents. The presentation includes a review of the
current state of XSL-FO, CSS, and EPUB as well as the specific supports
available for TEI, DocBook, JATS, and DITA.

Document models: structure and semantics

Taught by Nic Gibson.

Differing XML models provide differing semantic models. Publishers’ content will
match some models better than others. We will examine the semantic depth of
common models such as JATS, DocBook, and (X)HTML and look at how differing
content can be modelled with XML. We will look at lessons we have learned as XML
based publishing has become part of the mainstream of the publishing industry.
We will at successful XML implementations and at consider mistakes that have
been made (and how we can avoid them). We’ll particularly consider the idea that
every publisher needs their own schema and why this is almost always a mistake.
We will consider how metadata can be used for bibliographic and for marketing
purposes and how metadata standards can be used to improve the quality of
content when we are publishing to multiple output channels.

This session starts after lunch and continues after the break.