Good news everybody! It seems Marc21 is dead (or been told to order its last meal). Last week, the Library of Congress (LOC) working group on the future of bibliographic control has announced that:
the Library community’s data carrier, MARC, is “based on forty-year-old techniques for data management and is out of step with programming styles of today.”  The Working Group called for a format that will “accommodate and distinguish expert-, automated-, and self-generated metadata, including annotations (reviews, comments, and usage data.”  The Working Group agreed that MARC has served the library community well in the pre-Web environment, but something new is now needed to implement the recommendations made in the Working Group’s seminal report. In its recommendations, the Working Group called upon the Library of Congress to take action. In recommendation 3.1.1, the members wrote:
“Recognizing that Z39.2/MARC are no longer fit for the purpose, work with the library and other interested communities to specify and implement a carrier for bibliographic information that is capable of representing the full range of data of interest to libraries, and of facilitating the exchange of such data both within the library community and with related communities.” 
With these strong statements from two expert groups, the Library of Congress is committed to developing, in collaboration with librarians, standards experts, and technologists a new bibliographic framework that will serve the associated communities well into the future. Within the Library, staff from the Network Development and Standards Office (within the Technology Policy directorate) and the Policy and Standards Division (within the Acquisitions and Bibliographic Access directorate) have been meeting with Beacher Wiggins (Director, ABA), Ruth Scovill (Director, Technology Policy), and me to craft a plan for proceeding with the development of a bibliographic framework for the future.
Such news honestly fills me with joy, but I may need to reword some forthcoming talks. Lots of people have been tweeting and blogging, but Roy Tennant at Library Journal is surely allowed to celebrate the most, after all, he called for this nearly ten years ago.
Marc21 is more than a container format. Along with AACR2 (and RDA really) its a whole set of syntaxes, standards and working practices that represent a ‘transcriptive approach’ to metadata creation designed to generate a card catalogue record. This attitude has never worked to satisfaction in the networked environment and has given modern library programmers and hackers hours of pain.
Some thoughts about what may come to replace it …
Is Linked Data / RDF the right choice?
The LOC statement indicates a preference for Linked Data / RDF, but does not draw a distinction between the two. One is an idea, one is a syntax that can be used to encapsulate that idea. Still, RDF remains the most popular way of producing linked datasets.
Have Library of Congress made the right choice? Far to early to say. Its down to them to evaluate the tech, which is why they will be consulting. Some people will say that the LOC is a bit behind, and that linked data is a ‘has-been’ technology, a dead-duck. They may suggest some popular current tech alternatives such as:
- Schema.org. / HTML5 Microdata formats. Right now, this is not really the same use case as Marc, although a Marc replacement should be able to easily translate into this sphere. In some respects, for cultural heritage and research, what Google is doing is almost immaterial, as the web exists and extends well beyond search and advertising (and IMHO DuckDuckGo is generally a better search engine for many research purposes). Microdata right now is aimed at commercial applications and getting better sales links out there. A richer academic / cultural heritage application would be useful, but would need to be well adopted.
- NoSQL databases are great for varied types of data and are a natural fit for bib data, but they are just database software, just as plain and simple JSON is a great container format and only that (ditto with plain and simple XML). Anyone using such tech as an excuse for unstructured data will find structure inevitably creeps in. One day, they may want to look for a schema or standard to help simplify things…
We in cultural heritage really need some level of schema and data structure to work with from the get-go, a base set of fields that have a well defined meaning and that are commonly understood by people on opposite sides of the globe doing the same job. We need some defined controlled way of filling these fields with text. In terms of subject and name authority control on a global scale, linked data has such obvious advantages that it needs serious consideration.
Then we can wrap them up in sexy JSON and load them into our funky mongodbs. Technology should not dominate the conversation here, but it should be seen in perspective. We have a lot more flexibility, choice and freedom than we did 40 years ago when Marc21 was created.
How does this tie into the major library system vendors?
Details on next-gen LMS systems are thin on the ground. Serials Solutions are apparently building a web scale management around linked data. Carl Grant has indicated that Ex Libris Alma has hooks for linked data, presumably URIs for record nodes, which seems a prudent choice. He argues that RDF linked data still needs to find its killer app. Maybe library management is it? Imagine records that catalogue themselves by following links in data to generate new access points …
OCLC have ideas in this direction and have been experimenting with linked data. Nothing much yet other than data, though.
This announcement may be timely for some development cycles, less so for others. I would suggest that LMS vendor takeup of any new standard in at least an import/export/creation capacity will be vital to product success as long as librarians still care about data standards. I could be wrong though.
UK experience with RDF / Linked data
The UK has a slight edge over the US, thanks in part to the initial work of the discovery programme. The British Library BNB is available as linked RDF and would arguably act as an ideal test platform for examining many of the issues that might arise during standards formulation. The Open Bibliography project has lead the way in exploring open licensing.
That the UK community has largely recognized the need for permissive licensing (CC0 / PDDL) around linked data is prehaps the main thing to shout about at this stage. When navigating links, coming up against a license wall stopping re-use could make life really difficult.
Do we need complexity?
One of the myths we really need to blow open is that libraries need and use rich and complex metadata even for everyday needs. We really don’t.
We need a baseline standard that is easy to understand for staff and readers, easy to implement and to get right. This will be easily sharable and useable outside of ‘libraryland’.
The evidence? According to OCLC Research only 10% of all Marc tags in Worldcat appear in 100% of all Worldcat records. 65% of tags appear in less that 1% of records. Basically, most of it is un-used. The standard is bloated. Think about all those meaningless icons in MS Word …
Extensible standards such as Dublin Core and flexible RDF vocabularies would allow for complexity to be included when needed and ignored when not, in a way Marc does not. To paraphrase Owen Stephens at a recent JISC event, an attempt to rebuild the Marc tagset in RDF whilst ignoring existing vocabularies would be an abject failure, along the lines of MarcXML (‘the worst of both worlds’).
How can we involve others?
Making the standard or approach useful to a wider community beyond ‘libraryland’ will be vital to its success. The statement seems to recognize this, but is it enough to leave its ownership in the hands of librarians and the LOC alone?
Karen Coyle is again the voice of reason, arguing again and again quite practically that if the Library of Congress want a truly useful open standard accepted beyond libraries, they need to open up its formulation, management and ownership to wider body. She draws attention to NISO’s offer to take ownership of the work.
I tend to agree, and hope the LOC steps back here. NISO know standards and how to manage change. Tying into this blogs’ emerging wider theme, its also a chance for everyone, (vendors, libraries and publishers) to bang heads and innovate on the same page. Interesting times ahead.