2014 Archive-It Partners Meeting in Montgomery, AL on November 18. The meeting attendees are representatives from Archive-It partners with interests ranging from archiving webpages about art and music to archiving government webpages. (Presentation slides are now available on the Archive-It wiki.) This is ODU's third consecutive Partners Meeting (see trip reports from 2012 and 2013).
The morning program was focused on presentations from partners who are building collections. Here's a brief overview of each of those.
Penny Baker and Susan Roeper from the Clark Art Institute talked about their experience in archiving the 2013 Venice Biennale international art exhibition (Archive-It collection) and plans for the upcoming exhibition. Their collection includes exhibition catalogs, monographs, and press releases about the event. The material also includes a number of videos (mainly from vimeo), which Archive-It can now capture.
Beth Downs from the Montana State Library (Archive-It collection) spoke about working with partners around the state to fulfill the state mandate to make all government documents publicly available and working to make the materials available to state employees, librarians, teachers, and the general public. One of the nice things they've added to their site footer is a Page History link that goes directly to the Archive-It Wayback calendar page for the current page.
Beth has also provided instructions for their state agencies on how to include the Page History link and how to embed a Search box into the archive on their pages. This could be easily adapted to point to other state government archives or to the general Internet Archive Wayback Machine.
Dory Bower from the US Government Printing Office talked about the FDLP (Federal Depository Library Program) Web Archive (Archive-It collections). They have several archiving strategies and use Archive-It mainly for the more content rich websites along with born-digital materials.
Heather Slania, Director of the Betty Boyd Dettre Library and Research Center at the National Museum of Women in the Arts (Archive-It collections) spoke about the challenges of capturing dynamic content from artists websites. This includes animation, video (mainly vimeo), and other types of Internet art. She has initially focused on capturing websites of a selection of Internet artists. These sites include over 6000 videos (from just 30 artists). The next step is to archive the work of video artists and web comics. As part of this project, she has been considering what types of materials are currently capture-able and categorizing the amount of loss in the archived sites. This is related to our group's recent work on measuring memento damage (pdf, slides) and investigating the archivability of websites over time (pdf at arXiv, slides).
Nicholas Taylor from Stanford University Libraries gave an overview of the 2013 NDSA (National Digital Stewardship Alliance) Survey Report (pdf). The latest survey was conducted in 2013 and the first was done in 2011. NDSA's goal is to conduct this every 2 years. Nicholas had lots of great stats in his slides, but here are a few that I noted:
- 50% of respondents were university programs
- 7% affiliated with IIPC, 33% with NDSA, 45% Web Archiving Roundtable, 71% with Archive-It
- many are concerned with capturing social media, databases, and video
- about 80% respondents are using external services for archiving, like Archive-It
- 80% haven't transferred data to their local repository
- many are using tools that don't support WARC (but the percentage using WARC has increased since 2011)
After these presentations, it was time for lunch. Since we were in Alabama, I found my way to Dreamland BBQ.
After lunch, the presentations focused on collaborations, an update on 2014-2015 Archive-It plans, BOF breakout sessions, and strategies and services.
Anna Perricci from Columbia University Libraries spoke about their experiences with collaborative web archiving projects (Archive-It collections), including the Collaborative Architecture, Urbanism, and Sustainability Web Archive (CAUSEWAY) collection and the Contemporary Composers Web Archive (CCWA) collection.
Kent Underwood, Head of the Avery Fisher Center for Music and Media at the NYU Libraries, spoke about web archiving for music history (Archive-It collection). Kent gave an eloquent argument for web archiving: "Today’s websites will become tomorrow’s historical documents, and archival websites must certainly be an integral part of tomorrow’s libraries. But websites are fragile and impermanent, and they cannot endure as historical documents without active curatorial attention and intervention. We must act quickly to curate and preserve the memory of the Internet now, while we have the chance, so that researchers of tomorrow will have the opportunity to discover their own past. The decisions and actions that we take today in web archiving will be crucial in determining what our descendants know and understand about their musical history and culture."
Patricia Carlson from Mount Dora High School in Florida spoke about Archive-It's K-12 Archiving Program and its impact on her students (Mount Dora's Archive-It collection). She talked about its role in introducing her students to primary sources and metadata. She's also been able to use things that they already do (like tag people on Facebook) as examples of adding metadata. The students have even made a video chronicling their archiving experiences.
After a short break, we divided up into BOF groups:
- Archive.org v2
- Researcher Services
- Cross-archive collaboration
- QA (quality assurance)
- Archiving video, audio, animations, social media
- State Libraries
After the BOF breakout, the final session featured talks on strategies and services.
First up was yours truly (Michele Weigle from the WS-DL research group at Old Dominion University). My talk was a quick update on several of our ongoing projects, funded by NEH Office of Digital Humanities and the Columbia University Libraries Web Archiving Incentives program.
The tools I mentioned (WARCreate, WAIL, and Mink) are all available from our Software page. If you try them out, please let us know what you think (contact info is on the last slide).
Mohamed Farag from Virginia Tech's CTRnet research group presented their work on an event focused crawler (EFC). Their previous work on automatic seed generation from URIs shared on Twitter produced lots of seeds, but not all of them were relevant. The new work allows a curator to select high quality seed URIs and then uses the event focused crawler (EFC) to retrieve webpages that are highly similar to the seeds. The EFC can also read WARCs and perform text analysis (entities, topics, etc.) from them. This enables event modeling, describing what happened, where, and when.
In the final presentation of the meeting, Jefferson Bailey and Vinay Goel from Internet Archive spoke about building Archive-It Research Services, planned to launch in January 2015. The goals are to expand access models to web archives, enable new insights into collections, and facilitate computational analysis. The plan is to leverage the Internet Archive's infrastructure for large-scale processing. This could result in increasing the use, visibility, and value of Archive-It collections. Initially, three main types of datasets are planned:
- WAT - consists of key metadata from a WARC file, includes text data (title, meta-keywords, description) and link data (including anchor text) for HTML
- LGA - longitudinal graph analysis - what links to what over time
- WANE - web archive named entities
All in all, it was a great meeting with lots of interesting presentations. It good to see some familiar faces and to actually meet others I'd only previously emailed with. It was also nice to be in an audience where I didn't have to motivate the need for web archiving.
There were several people live-tweeting the meeting (#ait14). I'll conclude with some of the tweets.
Tweets about #ait14 since:2014-11-18 until:2014-11-19