Collections Link

Wednesday, Apr 23rd

Last update08:41:49 AM GMT

You are here: Blog In the Loop - 2 days with the Smithsonian Digitization Programme

In the Loop - 2 days with the Smithsonian Digitization Programme

altCollections Trust CEO Nick Poole reflects on two days working with the Smithsonian Institution in Washington DC on their Digitization programmes

The Smithsonian Institute is one of the world's most iconic cultural heritage institutions. Comprising 28 museums, galleries and research centres and a fully-fledged zoo, it is situated at the very heart of the psychological map of the United States - lining the National Mall which connects Capitol Hill and the Lincoln Memorial in Washington DC.

To say that the Smithsonian's collections are diverse is to underestimate the sheer scale and complexity of the challenge they face. An estimated 139m objects, ranging from the Space Shuttle to beetles and butterflies, supported by more than 137 cubic feet of archival records. Although they operate as a federation with institute-wide policies and priorities, each individual venue also operates with a fair degree of autonomy.

So how does an organisation of this scale and complexity confront the digital opportunity with creativity and confidence? How can the Smithsonian use technology to deliver on and extend the impact of its strategic mission of 'The increase and diffusion of knowledge'?

Smithsonian Digitization Programme - Facts & Figures

  • The Smithsonian holds 137m objects and 2m library volumes in its collections
  • 14m objects have been prioritised for Digitization, including type and representative specimen from the Colllections
  • 1.3m objects have already been digitized, representing around 9% of the priority group
  • The Smithsonian holds 137,000 cubic feet of archival material
  • 86,000 cubic feet (63% of the total) of archival holdings have been selected for digitization<
  • Circa. 24,000 cubic feet has already been digitized (28% of priority group)
  • The Smithsonian has currently created 7m electronic records about its object collections
The material already digitized can be accessed online at

Those who follow the development of the Digital agenda in museums will already be familiar with some elements of this strategy. In 2010, the Smithsonian Institution published its Digital Strategy. This was followed in 2009/10 with Mike Edson's brilliant, challenging vision of the Smithsonian Commons ( which ignited discussion in heritage communities around the world about the challenge and opportunity of taking a more open approach to publishing data.

CEO Nick Poole at the Smithsonian with Michael Edson and Gunther WaibelSince then, the Smithsonian has focussed increasingly on Digitization as the process through which many of its digital ambitions will be achieved. Leading on this immense Digitization programme is my old friend and collaborator Gunther Waibel (@GuWa on twitter) along with a team located within the Chief Information Officer's department.

Gunther is a smart guy, and knows that the best way to eat a whale is in small bites. So he has created a strategy which divides the problem into realistic chunks and enables the organisation to focus its capacity, operations and fundraising effort on a set of smaller, clearly-defined goals. To facilitate this process, and to secure the contribution and participation of his colleagues across the Smithsonian's many specialist departments, Gunther organised the 'Smithsonian Digitization Fair' - a one-day eveny of talks, workshops, seminars and displays covering the huge breadth of the Smithsonian's existing digital activity.

Gunther was kind enough to invite me to speak at the Digitization fair, and on the previous day I had the opportunity of spending some time with him and the internal consultancy team he is working with to try and quantify the cost of Digitizing the Smithsonian.

It was a fascinating meeting - as far as I know, the Smithsonian is pretty unique in having access to a team of internal specialists dedicated to bringing research and statistical analysis to bear on the strategic challenges faced by the Institution. I was able to speak with them about the strengths and weaknesses of our research (The Cost of Digitising Europe's Heritage) and the challenges of defining scope (ie. what gets included in the cost model) and the benefits of internal capacity-building over external outsourcing.

After this, I had a chance to attend a workshop led by Mike Edson, attended IRL by teams from all over the Smithsonian (and followed by more via the livefeed, an archive of which is available from Like many museums, the Smithsonian is wrangling with some very fundamental questions. They want to provide online access to their collections to deliver on their public mission. At the same time, they know that the material needs to be preserved and that this might mean trying to monetise some of it.

The Smithsonian's collections are hugely diverse. This display of Unmanned Aerial Vehicles is located at the Air and Space Museum in WashingtonIt was fascinating to get the insights of Smithsonian colleagues on some of these challenges - there was an interesting discussion (echoed the following day) about the issue of crowdsourcing vs. authenticity. Most challenging question had to go to 'What is the Queen doing about Crown copyright?' - to which the only honest answer is 'I don't know'!

On the day of the Digitization Fair itself, I arrived bright and early at the Ripley Lecture Hall, situated underground beneath what the Smithsonian affectionately calls 'The Castle' - an imposing red-brick building which sits beside the National Mall looking for all the world like a portal to another dimension. We were welcomed by Secretary G Wayne Clough, the Smithsonian's imposing yet genial Director. Secretary Clough, it turns out, is a Dylan fan, and he introduced the theme of the day with 'the times, they are a changin'. Digital, he acknowledged, is simply part of daily life for millions of people all over the world. As such, it provides an unprecedented opportunity for the Smithsonian to deliver its public mission beyond the walls of its institutions.

It is rare to find a major museum Director who not only understands the web and social media, but really grasps its far-reaching implications for audiences and for culture. it was a real pleasure to hear someone who enjoys the ear of the US Senate talk with such fluency about the possibilities of digital engagement and participation.

Following Secretary Clough, there was a quickfire round of 1-minute presentations from Smithsonian staff who were manning stands in the 'fair' section of the event (of which more in a moment). There were so many in such quick succession that I can't capture them all, but the whole thing took place against the backdrop of a massive online stopwatch ensuring everyone kept to their minute or less. Highlights from these presentations included:

  • A guy who provides GIS services to any Digital project at the Smithsonian (if you need places in your project, come talk to me!)
  • The 'laser cowboys' - the Smithsonian's 3D wizards and their 3D-printing kit (see also below)

Following on from the quick fireresent actions was an excellent talk from Smithsonian CIO Deron Burba. Apart from having one of the most interesting jobs in the world, it was interesting to hear Mr Burba talking about the challenges of Information Management at scale across the Institution - particularly the project they currenly have underway to integrate their Digital Asset Management system (currently 2.3m assets and counting) with their KE EMu Collections system. 

Next, Gunther had an opportunity to set out his vision of the development of the Digitization programme. By working with the teams and departments all over the Smithsonian, he has identified 10% of their collections as an urgent priority for Digitization. Interestingly, this prioritisation is mainly internal - emphasising resources which collections staff and curators know to be of prime importance in their collections. In time, the Smithsonian may work towards a more open and collaborative model of prioritisation, but this first pass already leaves them with a Digitization programme of nearly 14 million objects.

Gunther's vision is of a centrally-coordinated strategy which is expressed and delivered via operational plans within each curatorial department. The actual Digitization process itself will be supported through common, organisation-wide standards and possibly through access to shared services and technologies. The idea is not to constrain digitization activity across the whole Institution, but to ensure that it takes place within a common and consistent framework.

I was up next, and it was fantastic to have an opportunity to address an audience of people who are steeped in collections and digitization every day of their lives. You can see my presentation embedded below and on Slideshare, but I think the key points were:

  • Don't expect to make large amounts of money from digital cultural content
  • Digitization is not a project - it works best when embedded into ongoing collections development
  • The ENUMERATE project has generated lots of useful statistics on how different national are approaching Digitization

(I know, I know, it's a long way to fly just to say that, but I did say other things too! See the archive of the #si20 hashtag for more responses).

The questions following my presentation were really interesting. There was a discussion about how and where crowdsourcing fits into curatorial practice. I am increasingy convinced that (as Mia Rige has been saying for years) there is more than one kind of crowd and that our attitudes towards and relationship with user-generated content need to be tailored to accomodate differing degrees of complexity and expertise. I was able to draw on our recent experience of the Open Digitisation project, and lessons from FreeBMD and FreeBSD about the importance of understanding and doing justice to the personal and emotional commitment people make when they participate in crowdsourced projects. It is vital not to destroy the trust implied in these projects by marginalising their input. At the same time, the museum needs to negotiate between 'expert crowds' - small external authoritative and/or professional groups outside the museum - and general interest in participation.

The new 'GoSmithsonian' visitor appThere was also a great question about how the mainstreaming of technology into Collections practice is affecting the skills base of curators and Collections Managers. The Collections Trust is currently working on a project to give shape to the body of professional knowledge, attitudes and skills which Collections Managers need to have, and the need for digital literacy is very much incorporated into this.

Then it was time for lunch, and an opportunity to go and visit the 'fair'. This comprised 20-25 stands, each representing a different project within the Smithsonian. The standouts for me were:

  • The 3D Digitisation and Modelling team, who were doing incredible work to scan, model and 3D-print the Smithsonian's collections. They had two 3D printers on their stand and a robotic laser-scanning arm. They showed how the Qube printer (which retails at just $3500) is able to print good-quality models (in this case of Lincoln's death mask) in under two hours. We spoke about the immense disruptive potential of 3D printing in the consumer environment, and the opportunities it presents for museums. They told me about a supplier they are working with who will use a 3D scan of an object to create cardboard-lattice packaging which is cheap, highly-durable and precision-cut to the contours of the object. Imagine how that could reduce your insurance premiums for loans!

    (In a related project, a Smithsonian curator found himself in South America in a whale-skeleton site that had recently been discovered. On finding that the area was to be redeveloped within weeks, he invited the 3D Digitization team down with their robotic arm and asked them to scan the skeletons. The result was a 3D scan which was not only printed out as a highly-accurate replica, but which has also been shared with academic and research institutions worldwide.)
  • A team who are using a CT scanner to create 3D digital models of the inside and exterior of ape skulls, but to enable them to create accurate models and also to provide a research resource. The digital models of the skulls are compressed and made available online as a research resource through the Smithsonian's website. The already have hundreds of models, but estimate that they have completed just 30% of their collections.
  • A team who are pioneering 'rapid Digitization' among curatorial teams by using Photoshop's in-built 'pixel blend' technology. This works by allowing a user to take multiple photographs using a standard digital camera, and then 'blending' them intoa single high-resolution image. I have looked at similar technologies before and have always found that there are some artefacts of the blending process, particularly at high zoom. With pixel blend (which is built in as standard with Photoshop 5 and 6), however the software chooses each individual pixel on the basis of 'best pixel' and provides a much smoother blend without visible artefact.
  • A team who are creating a 3D digital model of the Smithsonian itself using a standard digital camera and some panorama/blending tools. This is delivered online and as a mobile application as the Smithsonian's 'interactive map' of the institution. The really clever bit, though, is that they take new images on an ongoing basis, so that earlier models now serve as a visual historic record of how the museums' displays have changed. In some areas this now makes it possible to take a virtual tour not only of the Smithsonian, but also of a favourite display from an earlier period.
  • A team who are responsible the Smithsonian's internal IT infrastructure. As an iconic national institution, the Smithsonian faces not just the challenge of bandwidth and moving data, but also of cybersecurity. The guy I spoke to reminded me that the Smithsonian's first web presence was developed 20 years ago (pre-dating most browsers) and that until relatively recently, the entire Institution's infrastructure ran through a single cable. It was good to be reminded that all of these digital aspirations come down to electrons and wire (or glass). If you are interested in the Smithsonian's work on cybersecurity, have a look at the excellent blog post on the subject from the office of their Chief Information Officer.

Suitably inspired, it was time to head into the afternoon sessions. There were 3 parallel tracks, so I can only feedback on the ones I attended:

Life-mask of President Abraham Lincoln produced by the Smithsonian's 'laser cowboys' - the 3D printing teamThe 'Sharing the Digital Wealth' session featured 3 titans of open content - Seb Chan (@sebchan), Mike Edson (@mpedson) and Sara Snyder (@sosarasays), ably Chaired by Darren Milligan (@darrenmilligan). If you're unfamiliar with the work they're all doing, I highly recommend seeking them out (YouTube, twitter, LinkedIn and Google). [Addition - Darren Milligan has since forwarded the URL for the outstanding work he is doing on opening up the Smithsonian's collections for education using technology. They've done some fascinating work on teacher engagement with digital resources, and it's well worth a look).

Mike spoke with great passion about the projects going on around the world, including Europeana, to open up collections for creative re-use. It is clear that the Smithsonian needs something of a kick to make the final push for the Smithsonian Commons, and Mike's presentation was as powerful for what it left unsaid as for what it said!

Seb Chan (formerly of the Powerhouse, now at the Cooper Hewitt Museum in New York) spoke about his work to open up collections data as 'cultural source code', and how the Cooper Hewitt has made its content available via Github to inspire and support a new generation of developers in embedding it into applications. See Seb's blog post 'Releasing the Collection on GitHub' for details! I had annoyed Seb earlier in the day by badmouthing Hackdays, and it was good to see him showing how cultural content can be used to drive new applications. He also finished his presentation with the signoff 'kthxbai', which makes him a Digital Native, I believe.

The always-cool Sara Snyder then presented her work in opening up the resources of the Archive of American Art, including via Wikipedia - see Sara's brilliant blog post 'A Very Wiki Summer' for more info. She noted the difference between an image on their website that had been viewed 27 times in the preceding 9 months vs 2,600 views on Wikipedia (supported by articles and new insights).As Mike Edson noted, this is a fantastic way for cultural heritage to flush bad copies out of the Internet and replace them with authoritative, high-quality content linked back to the institution.

You can see a shot of the visual notes of the session (yes, they took visual notes) via (Update: there is a fully-browsable set of the visual notes here on Photosynth - well worth a visit if only to marvel at the note-takers skill!)

For my final session of the day, I went to 'Mostly Metadata', chaired by Suzanne Plisk with Joe Depasquale, Kara Lewis and Carolyn Sheffield. Joe is one of those people that are quietly changing the world, but was incredibly humble about his work. He deals with astronomical data and images, and has found ways of embedding much richer metadata into his images in such a way that it maps to industry standards like IPTC. This way, when an image travels (ie. is uploaded to Flickr Commons), other platforms can dynamically extract and display the accurate information alongsdie the image. He showed an example of how Flickr Commons automatically extracts his astronomical data and displays it as a comment on the image. He also uttered *the* best line of the event when he said 'This is NGC1929. It's a Superbubble in Space. But that's irrelevant just now' (you can find out more about NASA's Chandra X-ray Observatory and what they're doing with the data by following @ChandraXray on twitter).

Next up, Carolyn Sheffield spoke about the Field Book project (see for details). Field books are an invaluable source of information about discoveries, journies and interesting marginalia from curators and collectors. The project has just crossed its milestone of 6000 digitized records, and Carolyn spoke about their work to develop a metadata standard, mapped to current standards such as Encoded Archival Description but designed to support crowdsourcing and transcription.

Finally, Kara Lewis, the Collections Information Programme Manager at the National Museum of the American Indian, gave a fascinating update on their work to review and improve internal information systems at the National Museum of the American Indian. Through an iterative process of review and quality-control, supported by the tools built into the NMAI Collections Management System (KE EMu), they have managed vastly to reduce variants of vocabularies such as place and people names. For example, they have reduced their list of place-name variants from 90,000 to 16,000 simply through iterative de-duplication. Having trialled the methodologies successfully at the National Museum of the American Indian, she hopes that they will ultimately be rolled out across the Institution. Kara is a real visionary about the connection between digitization, knowledge and information management and it was a pleasure to hear her speak on the subject - even if I am absolutely convinced that they need SPECTRUM to help them in the process!

The responsibility for closing the day fell to Steven Puglia, Digital Conversion Services Manager at the Library of Congress. The Library's Digitization programmes are a key part of the Digital Public Library of America, and Steve is right at the sharp end of putting very, very large-scale digitization into practice. I suspect that there isn't very much about digitization standards that Steven doesn't know - he is part of the development of the Federal Agencies Digitization Guidelines Initative (yes, 'FADGI') and a number of related initiatives which are seeking to improve and standardise quality control within digitization. Although his actual presentation isn't yet available, there is a lot of useful technical information in the Slideshare below.

And with that, it was time to bid a fond farewell to my colleagues in Washington. In all, it was an extraordinary experience to spend some time with one of the world's largest museums as it gets to grips with the questions at the heart of its Digital Strategy - how to digitize at scale, how to move forward but ensure that the output is as future-proof as possible, whether publicly-funded digitization can be enhanced and extended through commercial activity. Anybody that has tried implementing Digital Strategy in a museum will know that it can be hard enough to get individual departments to cooperate. Getting 28 museums, galleries and research centres and a zoo to work together it an extraordinary task. 

On their side, they have great people and a solid base of skills and experience to build on. On the other side, they have a lack of resources and the ongoing negotiation with Congress which risks the development of a consistent plan. It was wodnerful to see how they are forging ahead in areas like the use of 3D printing in a museum environment and to celebrate how much has already been achieved in rationalising and improving the flow of knowledge. 

Many thanks to the Smithsonian for hosting my visit and my congratulations to everyone there that is working so hard to bring their ambitious mission to life!

Nick Poole, CEO, Collections Trust


Comments (0)Add Comment

Write comment

security code
Write the displayed characters