Once upon a time, at the start of the World Wide Web, the Internet was a much more decentralized ecosystem. Many sites, web applications and email services were hosted on localized hardware, with servers being run within companies, small offices or even out of homes. However, the complexities of running these systems required specialized knowledge, and server administrators soon became costly. This was followed by a rise in specialized hosting, where, with minimal effort, users could set up email and web applications with the click of a button and have the entire software system managed for them. They did give up control but in return users were able to save on costs associated with traditional technology professionals.
Over the years, hosting companies merged and new ones sprung up, offering software as a service (SaaS) and infrastructure as a service (IaaS). These services were collectively known as The Cloud, and were deployed in enormous data warehouses and were controlled by a single company. Some companies provided these software services (email, web hosting and document management) for free, in return for access to, and profiling of, the personal information stored within a hosting company’s data center.
Slower-moving sectors such as education, government, health and social services, have begun to migrate to the cloud, seeing it as a cheaper, easier way to deploy software services. Entities in these sectors have started to move a lot of their data to this cloud systems, and national organizations have started to mandate the centralization of information. On the surface this may look like a good idea, but for educational and archiving systems this could have a detrimental effect, both in the short and long term.
The Case Against Centralization of Archived Data
There are a number of drawbacks to centralizing data. We’ve seen in some academic circles the introduction of catch-all archives and the results have not been positive.
A single point of failure
The most obvious issue with any cloud-base solution is a single point of failure. A centralized solution only works when it is running; if anything should happen to the system to bring it down; security breach, a bug in the code, overwhelming traffic; everyone is affected.
Security is also a big concern; an attack by a malicious entity could bring down the entire system. Targeting a centralized system is much simpler than having to attack a network of decentralized servers and applications.
However, failure doesn’t necessarily need to be confined to the technology. Having a centralized system usually results in less people working within the ecosystem. There are less developers, less system administrators and a much smaller (or in some cases a non-existent) support community. In some cases, we have seen centralized services which have a single point of contact; if this contact leaves or no longer wishes to support the system, there is no longer anyone to contact. With no support, the centralized system is no longer usable.
One size fits all
One centralized system we have had experience with requires all end-users to submit items directly to a single super-archive. This super-archive assumes all information submitted will be the same and so mandates this single submission process even though users are submitting a variety of information.
To compound the problem, the ecosystem cultivated around this centralized provider promotes a 3rd party repository for data storage of each organization’s archive. However, there is no differentiation between each archive; instead all archives look the same and there is no method for integrating the data stored into other applications or solutions. There is also no differentiation between each archive; all archives are housed under the same url.
This one-size-fits-all approach stifles innovation and results in a lacklustre user experience. It also means organizations are unable to disseminate their own information in a way that works for them.
Lack of competition stifles innovation
By mandating the centralization of all archives with a single host there is no longer any incentive for the controlling provider to innovate; all competition has been eliminated. We were recently astonished to see a university with an innovative, creative archive solution choose to throw it away in favour of a generic, centralized solution which makes no effort to differentiate repositories from one another and provides no ancillary services, instead simply being a receptacle for digital data; somewhere to tuck important information away and forget.
Too much control in the hands of a few
Centralizing all data within a single super archive means a single entity controls all of the data as well as the flow of information to those who require it.
Centralizing data give a very small group or even a single organization or individual too much power of the information being archived. This leads to laziness as the few in control only need to ensure the bare minimum is achieved to maintain the status quo.
Decision-making can often be corrupted because organizations wishing to pervert the process for their own gains only need to approach a select few. Worse still, decision makers may also be associated with organizations providing services to the centralized platform enabling them to implement decisions which are not in the best interests of the users of the system. There is often no recourse to challenge these decisions and end-users are often left having to change their work-flows and processes to meet the new requirements of the centralized system, usually a great cost.
Centralized systems are often closed. End-users are at the mercy of the centralized system and are often left to play catch-up when decision makers change the system without warning. We’ve experienced this first hand, when we see our software suddenly stop working because the centralized system has changed without warning.
So what’s the solution…
With the rise of blockchain technologies, we’re witnessing the decentralization and democratization of all types of information. As the concepts of blockchain take hold, we’ll see end-users, consumers and creators take back control of the data currently stored in these centralized systems.
That said, it will take some time before we transition over to a fully decentralized environment. In the meantime, it is important for organizations to have complete control over their information; how it is submitted, stored and consumed. Organizations should not be reliant on centralized systems, especially those that exert too much control over the archival ecosystem.
For users of web applications such as DSpace, it is important that a single organization or department within an organization deploy their own software and use it as a the primary repository of their digital data.
Federated repositories are important; they provide a single point from which information can be searched and consumed. However, they should not be the primary data store nor should users be required to submitted directly to them. Instead each organization should have their own master copy of their info and the federated entity should harvest and ingest the information using open standards and tools.
A recent blog was posted which nicely highlights the benefits of decentralized wisdom over the governance of a few. I think the same benefits apply to a decentralized archiving solution over a centralized one.