Digital Preservation Bake Off Challenge

Are you ready to face the challenge !?

Introduction

The Digital Preservation Bake Off Challenge is framed as an open, light-hearted competition in which solution providers, developers and coders can demonstrate their products and tools while allowing participants to observe the process and verify the claims they make.

There’s no actual baking.

The term ‘Bake Off’ describes a technology showcase in which a group of ‘judges’ (‘tasting panel’) provide a common challenge, based on a common set of inputs and requirements, to a group of technologists (‘bakers’). The requirements are based on real-world situations which are of wide enough relevance to be recognized by the global digital preservation community. They relate to any number of digital preservation themes, such as integrating digital preservation into line-of-business systems, thorny content types, metadata extraction and handling, meeting user requirements, scaling up, web-archiving, or achieving greater efficiencies while not sacrificing quality of outcomes. They relate to end-to-end systems produced by large teams as well as to specific tools and applications from individual hackers and developers.

The Great Digital Preservation Bake Off will be staged in three rounds, which, perhaps inevitably, we are describing as ‘Courses’:

  • The Starter Course includes all the actions that happen before a digital object is transferred to or ingested into an archive repository;
  • The Main Course includes actions once a collection has been transferred into an archive repository;
  • The Dessert Course concerns all aspects of access and user engagement with data that has been preserved.

Challengers are welcome to participate in as many courses as they like. Each course involves a series of tasks to be performed with a common menu of ingredients in front of an expert ‘Tasting Panel’ comprising digital preservation experts drawn from the iPres Program Committee and audience.

Leading up to the iPres 2022 Digital Preservation Bake Off, the bakers can decide which of the challenges (‘course’) they want to participate in and what of their standard toolkits and workflows they would like to demonstrate. Challengers can choose which ingredients to work with, allowing them to specialize on their favorites or can attempt ‘The Works’. They can demonstrate a complete solution, a single tool or a chain of tools assembled into a workflow. During the bake off sessions at iPres 2022 the ‘bakers’ demonstrate their results to the judges as well as the on-site and remote audience. Three ‘mystery ingredients’ will be revealed during the session and while there is no requirement to make use of them, we strongly encourage the ‘bakers’ to refer to one or several in their demos. After each demonstration the audience will have the chance to ask questions and give feedback, followed by the Tasting Panel judging the success of the dish and awarding prizes accordingly. In this way the bakers can demonstrate their systems and tools based on real world problems selected by the community. They are able to learn from the experience. The Bake Off will be more interactive and interesting than a standard pre-prepared tool demo!

The Tasting Panel will take into account that large integrated solutions are pitched against small but tightly focused implementations. To extend the metaphor, they will understand the difference between a gourmet banquet in a fine restaurant and a perfect cup of coffee which sets you up for the day. The Panel will also take on board feedback from the audience, including from the challengers themselves.

The iPres 2022 Pantry:  the Data Set

Every dish is prepared using a shared Data Set called ‘iPres 2022 Pantry’. The Data Set includes a number of different content types, ranging from generic and not-so generic PDFs, still images and office documents, to complex objects such as AV, 3D and disk images, to web-based objects such as websites and social media.

In addition, the pantry includes an ‘exotic ingredients’ section. This section contains data with additional challenges, such as unidentifiable objects, corrupt objects or legacy file formats.

Challengers can pick and choose from the pantry as they like.

Starters: The Pre-Ingest Aperitif

This course is all about assessing and recovering a resource as a first step in the preservation process. Challengers can demo any process relating to pre-ingest, such as preparing a collection for ingest into a digital preservation system, generating metadata or undertaking quality assurance in the construction of an Information Package.
A “dish” consists of one, or a combination of the following tasks:

  • Appraisal
  • Cyber Security / Virus checking
  • Data protection risk assessment
  • Data recovery
  • De-duplication
  • Disk imaging
  • File format identification
  • File format characterization
  • File format validation
  • Harvesting content / metadata from the web
  • Metadata extraction and/or creation
  • Resolve filename encoding issues
  • Technical protection measures 
  • Any other process relating to pre-ingest – there are so many great cuisines and we might have forgotten dishes! If challengers include other dishes, they should be prepared to explain why the process is an integral part of pre-ingest.

 

Main course: The Preservation Plat du Jour

This course is all about what we do with digital objects within an archive, in other words about core preservation actions at scale based on a choice of collections. Challengers can demo any process relating to preservation planning/action and data management, such as generating preservation information during ingest, risk management or migration.
A “dish” consists of one or a combination of the following tasks:

  • Audit/Provenance trail information
  • Emulation planning and preparation
  • Fixity checking
  • Ingest
  • Metadata enrichment
  • Migration
  • Multi copy management
  • Normalization
  • Preservation metadata (e.g., PREMIS)
  • Preservation management / planning / watch
  • Replication
  • Risk management
  • Any other process relating to core preservation functions – there are so many great cuisines and we might have forgotten dishes! If challengers include other dishes, they should be prepared to explain why the process is an integral part of preservation functions within an archive

 

Dessert: Access is So Sweet

This course is all about access to preserved objects under different conditions. Challengers can demo any process related to management of information needed for access or how access is facilitated for different object/content types in light of different legal / organizational requirements. 

A “dish” consists of one or a combination of the following tasks:

  • Access to sensitive data / Data protection
  • Access rights management
  • Advanced resource discovery (content based image / video / sound)
  • Automation of access
  • Computational access
  • Audit/Provenance trail provided on access
  • Emulation for access
  • Migration on the fly
  • Proving authenticity on access
  • Presentation of complex objects
  • Preservation Information provided on access
  • Redaction
  • Resource discovery (catalogue)
  • Universal accessibility to legacy data
  • Updating information packages with metadata generated by external users during access (“Crowd Sourcing”)
  • Any other process relating to core preservation functions – there are so many great cuisines and we might have forgotten dishes! If challengers include other dishes, they should be prepared to explain why the process is an integral part of access within an archive

 

Things to consider during baking / demo

While there will be no points awarded during the bake off, the judges will give feedback to each challenger. In particular, they will look at the following points, which challengers might want to take into consideration when preparing their workflows:

  • Difficulty of data used from pantry / data set
  • Number of tasks completed
  • Quality of outputs
  • Clever insight
  • On the spot problem solving
  • Making it look easy
  • Making it actually easy
  • Time / cost involved
  • Quality of logging tasks
  • Including the mystery ingredient
  • Reproducibility of the result/solution
  • Bonus: ‘Doggybag challenge’ – show how your dish can be preserved!

 

Principles behind presenting / baking

  • Tools and solutions of all sizes are welcome!
  • Presentation can be on site / remote 
  • Presentation can be a live demo or a pre-recorded demo
  • No slides! (After all, we’re asking you for a full dish and not an Insta-Post of what we could be eating)
  • 10 minutes per baker including audience Q&A
  • Each baker can participate in 1, 2 or all 3 courses
  • It is not necessary to be the developer of a new tool / software, you can also introduce existing tools in new innovative ways, e.g. through repurposing, automating, orchestrating etc.
  • If you have developed / created a tool and people are using it, this session is probably for you 😉

 

Are you ready to face the challenge !?

Are you ready to face the challenge !?

If you would like to join the Digital Preservation Bake Off, please submit your contact details, your software / workflow / tool name and which course(s) you would like to participate in by September 1st to ipres2022@in-conference.org.uk. Please note that you need to be registered for the iPres 2022 conference in order to join the Bake Off. If you haven’t registered yet, you can still register for online attendance at iPres 2022, and challenge us in the Bake Off.

Take a look at the newly released data set in the well-stocked iPres 2022 Pantry.

We will hold an orientation meeting for all challengers 2 weeks before iPres 2022 to allow for Q&A.”

The Tasting Panel

  • Micky Lindlar (Chief Connoisseur) – TIB 
  • Jen Mitcham – DPC
  • Klaus Rechert – HS Kehl
  • Tracy Seneca – UIC
  • Leontien Talboom – CUL
  • Kieran O’Leary – NLI