I’ve spent a large portion of the past couple of years working with a local discovery layer (Aquabrowser) and am currently investigating equivalent ‘webscale’ discovery index solutions such as Summon, Primo Central or EBSCO Discovery that may supplement or replace it.
I’ve occasionally found myself explaining the two solutions to non-library techy or developer colleagues. When we discuss the large webscale indexes such as Summon, folk have on more than one occasion asked me – why not do this yourself, its just Lucene and Solr (or ElatsicSearch/Sphinx) scaled up … right?
Not exactly. Here are three reasons why …
For this to happen, we would firstly need to get our hands on the data / full text indexed by the commercial solutions. This is no easy task.
Web scale suppliers have signed up publishers to partnership programmes to allow for harvesting and crawling of content. Agreements are most likely bi-laterial rather than universal and no real standard yet exists for this interchange. Its easier and probably cheaper (at least initially) for me to buy into someone elses’ hard work here. But this is itself rather dangerous, it amounts to quite a serious outsource and potential loss of control. The only real influence on change could be by switching vendor.
The Library Loon has recently commented on two important recent Open Data releases from Nature and OCLC and the potential impact this can have on Discovery services. If Open data in libraries really needs a better use case, this is surely it.
The problem is, the data that is most valuable is the stuff libraries themsleves do not own. It would be great to see more publishers follow Natures example and an end to the silly games of withholding data from competitor services.
Take your ‘just Lucene / Solr’, ingest and normalize varied data from 2-300+ different sources, scale for hundreds of thousands of consecutive users and accommodate well over fifty million records. Then keep it mirrored worldwide with 24×7 uptime.
Again, for a single library, the cost of an annual sub versus the startup costs for a DIY service are simply not comparable.
So why not seek a partner, I may be asked? This brings me onto …
3) Management and ownership
Collaboration is hard, especially at an institutional level (2+2 =3 etc.). I recently read this long but fascinating insight into the running of various web Portal services such as Intute, and how with a bit of dynamic thinking they ‘could’ have morphed into a service such as Summon. Its a very personal piece, although I do agree with the inherent sillyness around trying to catalogue even ‘the best of the web’, its not simply relevant to search-engine centric web usage.
From what is described there, the budgets and resources were in place to potentially attempt this. But this was six years ago, pre-crash when we had money. Things in the UK HE sector are very different now.
With a change in operation of the JISC, its not exactly clear who could take this on, although it is possible that if some future combination of Archives Hub and COPAC started to absorb open data from publishers, it may evolve on its own.
Does it matter?
Right now we have three large commercial players in the library web-scale market, all in close competition. Hopefully, this should surely be enough to keep things fresh and current.
I have argued that the lack of development over the past 20 years in LMS products, especially with the OPAC has assisted in the marginalization of library services. So that this is not repeated, I would again agree with the Library Loon and hope web scale discovery service vendors continue to grow and innovate with their products and rely less on the coverage of material to act as a selling point. Summon has recently launched discipline centric searches and Primo Central has some fascinating ideas around relevancy ranking understanding user context. I hope this trend at least continues.