My Blog

Trends and Transients 2019


Each year there are more new technologies to keep track of, more ways to organise
your life and your company’s information, more ways to communicate. This session
will introduce you to new and potentially over-hyped technologies, discuss
older, overlooked technologies, and entertain you at the same time. Our expert
speakers will present and debate current issues, giving you the benefit of their
wide experience and differing points of view, so you can decide for yourself
which technologies will meet your needs and which are a waste of your time.

This course is chaired by Dr Peter Flynn and taught by Dr Peter Murray-Rust, Dr Cigdem Sengul, Dr David Shotton, and Graham Klyne.

Classes for 2019

The Trends and Transients course runs on

Open Scholarship and Open Citations – present problems and future feasibilities

Taught by David Shotton.

In this presentation, I will shift the focus from the detailed implementation of
XML to consider the benefits that flow from making bibliographic metadata
available in machine-readable form. Specifically, I will discuss this in
relation to scholarly citations, and the work of OpenCitations, a small
independent scholarly infrastructure organization dedicated to open scholarship
and the publication of open bibliographic and citation data by the use of
Semantic Web (Linked Data) technologies.

Following a discussion of the stages and benefits of Open Scholarship, I will
discuss the present transitional state of academic publishing. I will then
compare the semantics of XML and RDF, the ‘language’ of the Semantic Web, which
I will introduce using a simple example, and will then describe the SPAR
(Semantic Publishing and Referencing) Ontologies that can be used to describe
all aspects of the scholarly publishing domain.

I will then discuss the advantages of treating bibliographic citations as
first-class data entities, the Open Citation Identifiers that can be used to
identify open citations uniquely, and the OpenCitations Indexes we are building
to enable them to be searched and downloaded.

Having mentioned the current collaborators and users of OpenCitations data and
services, I will outline our plans for the radical expansion of OpenCitations
required to enable us to provide a genuine open alternative to the current
monopolistic position of the two main commercial citation indexes, Web of
Science and Scopus.

I will conclude with a discussion of the requirements for sustainability of open
infrastructure organizations such as OpenCitations, and the various financial
models that might provide the funds to enable such sustainability.

Key references:

Silvio Peroni, David Shotton (2019) OpenCitations.

Silvio Peroni and David Shotton (2018) Open Citation: Definition. Figshare.

Silvio Peroni and David Shotton (2018) Open Citation Identifier: Definition.

David Shotton (2018). Funders should mandate open citations. Nature
553: 129

Silvio Peroni and David Shotton (2018). The SPAR Ontologies. In: Vrandečić D. et
al. (eds) The Semantic Web – ISWC 2018. ISWC 2018. Lecture Notes in Computer
Science, vol 11137. Springer, Cham.

Silvio Peroni, David Shotton, Fabio Vitali (2017). One Year of the OpenCitations
Corpus: Releasing RDF-based scholarly citation data into the Public Domain. In
The Semantic Web – ISWC 2017 (Lecture Notes in Computer Science Vol.
10588, pp. 184–192). Springer, Cham.

Silvio Peroni, Alexander Dutton, Tanya Gray, David Shotton (2015). Setting our
bibliographic references free: towards open citation data. Journal of
71 (2): 253- 277., OA at

David Shotton (2013). Open citations. Nature, 502 (7471): 295-297.

Copyright, XML, and the value of markup

Taught by Peter Murray-Rust.

In 1997–8 while helping to develop XML, I saw it as an opportunity to liberate
thought and communication. In this spirit, Henry Rzepa and I developed Chemical
Markup Language (CML), which has evolved to being a fluid natural language of
objects, rather than a centrally-controlled DTD or schema.

However, XML in science publishing has become centralist and arcane. JATS does
not support authors — it removes creativity. Publishers who used to expose XML
now hide it, so the 20-year-old dream of reusable XML reinterpreted in the
browser is currently on hold.

But XML is a symptom, not the cause.

“Publishing“ dominates science and constrains the research that people do. The
big publishers want to control how science is communicated, with a sacred
“version of record” in an unalterable PDF. But that’s not how creative
scientists think — Perelman communicated his Poincaré proof solely through
arXiv. It is the content, not the container, than matters.

Big science publishers are also expensive and unregulated. The cost of a
preprint on arXiv and other preprint repositories is about 10 USD: costs are
minimal because authors use Word or LaTeX to create semantic documents before

XML was meant to remove the friction of rekeying and typesetting, but it has
probably made it worse: the cost of a processed manuscript should be no more
than 250 USD but the current prices to publish chemistry articles in “high
impact” journals average 2,500 USD. The process disenfranchises authors and
readers (should blind readers have to read two-column PDF text and bitmaps?) and
actively prevents them from downloading sufficient material (10,000+ articles)
to do systematic reviews.

Many scientists can no longer publish in that way: it’s limited to the rich west
(universities) and has become an instrument of neo-colonialism. But publishing
*can* be inclusive, as the LatAm countries have demonstrated, most recently
through the AmeliCA initiative, which is JATS-XML for Latin America and the
Global South, defeating the subordination of the global conversation of science.

Can XML once again become an instrument for innovation and democratisation? I
hope to be informed by talking with delegates, and I shall give interactive
demonstrations of what XML can do if the political will is there. I shall
present from my own machine and make slides available immediately.

Authorisation in the Internet of Things

Taught by Cigdem Sengul.

While authentication and authorisation are basic security requirements,
implementing in IoT (Internet of Things) environments may be a challenge. OAuth
2.0 is a standardized authorisation framework that allows the user to
participate in granting permissions to applications seeking user data, which
enables meaningful privacy control. Embracing the OAuth 2.0 framework, there are
several standardisation working groups extending and innovating with OAuth2.0 to
address the diverse challenges of IoT environments.

The talk will be composed of four parts:

  1. A brief introduction to IoT and security challenges
  2. A brief overview of OAuth2
  3. Revision of ongoing work on standardisation for OAuth2-based IoT standards
    • OAuth2 Device Authorisation Grant: Designed for internet-connected
      devices that either lack a browser or are input-constrained to the
      extent that requiring users to input text to authenticate is
    • UMA (User-Managed Access): Designed to enable a resource owner to
      control protected resource access by requesting parties in an
      asynchronous fashion.
    • ACE (Authentication and Authorisation for Constrained Environments):
      Designed for IoT environments based on a set of building blocks
      including OAuth2.0 and CoAP, and thus making a well-known and
      widely-used authorization solution suitable for IoT devices.
  4. Open challenges
Linked Data in Digital Humanities

Taught by Graham Klyne.

The field of Digital Humanities explores the ways in which digital technologies
can be used to support and enhance humanities research. This session will be
about experiences of using linked data (RDF) and associated Semantic Web
technologies in Digital Humanities applications.

I shall start by discussing some ways that application of Semantic Web
technologies in humanities may differ from use of the same in scientific or
other common applications. I shall then discuss applications from projects I’ve
been working on recently, showing how affordances of linked data can support
some particular requirements of humanities data:

1. MELD: Music Encoding and Linked Data (Fusing Audio and Semantic technology
(FAST) project
). The FAST project is exploring the intersection of
semantic and music-related technologies. I shall talk about the MELD framework,
which uses linked data in applications that establish and act upon connections
between musical structure, music-related media and other data.

2. EM Places: Early Modern place information (Cultures of Knowledge
(CofK) project
). The EM Places project is building an online reference
resource of place information from Early Modern Letters Online (EMLO). I shall
discuss the development of a data model that is required to capture historical
contextual information, and also to incorporate up-to-date information from
other sources of place information.

The session will focus on digital humanities applications, but some of the
experiences may be more broadly relevant. Technically, the session will focus on
data modelling issues rather than specific details of linked data formats or
storage systems.