Recently I was emailing a friend and wanted to update her about the recent buzz we have enjoyed with Hany SalahEldeen's TPDL 2012 paper about the loss rate of resources shared over Twitter. I remembered that an article in the MIT Technology Review from the Physics arXiv blog started the whole wave of popular press (e.g., MIT Technology Review, BBC, The Atlantic, Spiegel). To help convey the amount of social media sharing of these stories, I was sending links to the sites using social media search engine Topsy. Having recently discovered it, Topsy has quickly become one of my favorite sites. It does many things, but the part I enjoy most is the ability to prepend "http://topsy.com/" to a URI to discover how many times a URI has been shared and who is sharing it. For example:
http://www.bbc.com/future/story/20120927-the-decaying-web
becomes:
http://topsy.com/http://www.bbc.com/future/story/20120927-the-decaying-web
and you can see all the tweets that have linked to the bbc.com URI.
While composing my email I recalled the Technology Review article was the one of the first (September 19, 2012) and most popular, so I did a Google search for the article and converted the resulting URI from:
http://www.technologyreview.com/view/429274/history-as-recorded-on-twitter-is-vanishing-from-the-web-say-computer-scientists/
to:
http://topsy.com/http://www.technologyreview.com/view/429274/history-as-recorded-on-twitter-is-vanishing-from-the-web-say-computer-scientists/
I was surprised when I saw Topsy reported 0 posts about the MIT TR story, because I recalled it being quite large. I thought maybe it was a transient error and didn't think too much about it until later that night when I was on my home computer where I had bookmarked the MIT TR Topsy URI and it said "900 posts". Then I looked carefully: the URI I had bookmarked now issues a 301 redirection to another URI:
% curl -I http://www.technologyreview.com/view/429274/history-as-recorded-on-twitter-is-vanishing-from/
HTTP/1.1 301 Moved Permanently
Server: nginx
Content-Type: text/html; charset=utf-8
X-Drupal-Cache: MISS
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0
ETag: "1352561072"
Content-Language: en
Last-Modified: Sat, 10 Nov 2012 15:24:32 GMT
Location: http://www.technologyreview.com/view/429274/history-as-recorded-on-twitter-is-vanishing-from-the-web-say-computer-scientists/
X-AH-Environment: prod
Vary: Accept-Encoding
Content-Length: 0
Date: Sat, 10 Nov 2012 15:24:32 GMT
X-Varnish: 1779081554
Age: 0
Via: 1.1 varnish
Connection: keep-alive
X-Cache: MISS
A little poking around revealed that technologyreview.com reorganized and rebranded their site on October 24, 2012, and Google had already swapped the prior URI to the article with the new URI. Their site uses Drupal and it appears their old site did as well but the URIs have changed. The base URIs (e.g., http://www.technologyreview.com/view/429274/) have stayed the same (and is thus almost "cool"), but the slug has lengthed from 8 terms ("history as recorded on twitter is vanishing from") to the full title ("history as recorded on twitter is vanishing from the web say computer scientists"). Slugs are a nice way to make the URI more human readable, and can be useful in determining what the URI was "about" if (or when) it becomes 404 (see also Martin Klein's dissertation on lexical signatures). The base URI will 301 redirect to the URI with the slug:
% curl -I http://www.technologyreview.com/view/429274/
HTTP/1.1 301 Moved Permanently
Server: nginx
Content-Type: text/html; charset=utf-8
X-Drupal-Cache: MISS
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0
ETag: "1352563816"
Content-Language: en
Last-Modified: Sat, 10 Nov 2012 16:10:16 GMT
Location: http://www.technologyreview.com/view/429274/history-as-recorded-on-twitter-is-vanishing-from-the-web-say-computer-scientists/
X-AH-Environment: prod
Vary: Accept-Encoding
Content-Length: 0
Date: Sat, 10 Nov 2012 16:10:16 GMT
X-Varnish: 1779473907
Age: 0
Via: 1.1 varnish
Connection: keep-alive
X-Cache: MISS
But this redirection is transparent to the user, so all the tweets that Topsy analyzes are the versions with slugs. This results in two URIs for the article: the version from Sept 19 -- Oct 24 that has 900 tweets, and the Oct 24 -- now version that currently has 3 tweets (up from 0 when I first noticed this). technologyreview.com is to be commended for not breaking the pre-update URIs (see the post about how ctv.ca handled a similar situation) and issuing 301 redirections to the new versions, but it would have been prefereable to have maintained the old URIs completely (perhaps the new software installation has a different default slug length, I'm not familiar with Drupal and in the code examples I can find a limit is not defined).
Splitting PageRank with URI aliases is a well-known problem that can be addressed with 301 redirects (e.g., this is why most URI shorteners like bitly issue 301 redirects (instead of 302s), so the PageRank will accumulate at the target and not the short URI). It would be nice if Topsy also merged redirects when computing their pages. In the example above, that would result in either of the Topsy URIs (pre- and post-October 24) reporting 900+3 = 903 posts (or at least provided that as an option).
--Michael
Edit: I did some more investigating and found that the slug doesn't matter, only the Drupal node ID of "429274" (those familiar with Drupal probably already knew that). Here's a URI that should obviously return 404 redirecting to URI with the full title as the slug:
% curl -I http://www.technologyreview.com/view/429274/lasdkfjlajfdsljkaldsf/
HTTP/1.1 301 Moved Permanently
Server: nginx
Content-Type: text/html; charset=utf-8
X-Drupal-Cache: MISS
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0
ETag: "1352581871"
Content-Language: en
Last-Modified: Sat, 10 Nov 2012 21:11:11 GMT
Location: http://www.technologyreview.com/view/429274/history-as-recorded-on-twitter-is-vanishing-from-the-web-say-computer-scientists/
X-AH-Environment: prod
Vary: Accept-Encoding
Content-Length: 0
Date: Sat, 10 Nov 2012 21:11:11 GMT
X-Varnish: 1782237238
Age: 0
Via: 1.1 varnish
Connection: keep-alive
X-Cache: MISS
This makes the Drupal slug very close to the original Phelps & Wilensky concept of "Robust Hyperlinks Cost Just Five Words Each", which formed the basis for Martin's dissertation mentioned above. While this is convenient in that it reduces the number of 404s in the world, it is also a bit of a white lie; user agents need to be careful to not assume that the original URI ever existed even though it is issuing a redirect to a target URI.
Saturday, November 10, 2012
Wednesday, November 7, 2012
2012-11-06: TPDL 2012 Conference
It
all started last April, particularly on the 9th, when I received an
email from the Dr. George Buchanan delivering the good news, my paper
have been accepted at the annual international conference on Theory
and Practice of Digital Libraries TPDL 2012. Being the Program Chair,
Dr. Buchanan sent me the reviews and feedback associated with my
paper which was entitled “Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?” which paved the way in the
following months for the preparation process to present this paper.
Along with submitting the paper, Dr. Nelson gave me the permission to submit my PhD proposal to be considered for the Doctoral Consortium at the conference. Scoring my second goal, Dr. Birger Larsen and Dr. Stefan Gradmann sent me a delightful email announcing the committee's acceptance to my proposal and I was invited a day before the conference to present my work at the consortium.
The
Hat-trick came a few weeks before the conference in the form of an
email from Dr. Birger proposing that I present my work, from the
doctoral consortium, at the poster session on the first day of the
conference. Overwhelmed with joy, I gladly accepted this gracious
invitation and started working on the poster.
After
an 8 hour drive to New York and a couple of flights, I arrived to
Larnaca airport in Cyprus. I can't complain because two of the most
closest activities to my heart are driving and travelling. Anyway, I
took the bus to Limassol from the airport and was supposed to take another bus to Paphos, where the conference is held, but
unfortunately it didn't come. After a quick chat with two French
ladies who happened to be heading to Paphos too, we shared a taxi
there and I finally arrived to the hotel which I will be spending the
following nights, Cynthiana beach hotel. With a captivating view and
spacious suites, Cynthiana hotel was located 10 minutes by bus from
the venue of the conference and half way to the center of the city.
After giving the presentation and receiving the questions and feedback, I
sat down and listened to the work of fellow PhD students: Tuan VuTran, Armand Brahaj, and Nut Limsopatham. Shortly after wrapping up
the consortium, Dr. Larsen and Dr. Gradmann took us to the city pier
to have an authentic Cypriot dinner. The food, the atmosphere, and the
company were marvelous. Later that night I arrived back to the hotel
exhausted.
The
next morning the conference commenced. Following the welcome notes by
Dr. Buchannan Dr. Mounia Lalmas gave a marvelous keynote speech
entitled “User Engagement in the Digital World”. Dr. Lalmas is a
visiting principal scientist at Yahoo! Labs Barcelona. She talked
about user engagement and the emotional, cognitive, and behavioral
connection between the user and the technological resource. She
discussed ways to measure this engagement and to model it, along with
some select experiments discussing those several aspects.
After
the keynote speech we had a short coffee break where I met some
people I haven't seen since JCDL earlier in June. Then I headed to
the 2nd track sessions entitled “Analyzing and Enriching
Documents “ which included several interesting papers by
Róisín Rowley-Brooke, my friend Luis Meneses, Daan Odijk, and
Annika Hinze who had 4 papers published in this conference, which I found fascinating. The lunch break followed and I had
to do a phone interview with Ms. Lesley Taylor from the Toronto Star
who wrote an article about the paper I am presenting at the
conference.
Following
the lunch I attended the session entitled “Extracting and Indexing”
where Guido Sautter, Benjamin Köhncke, and Georgina Tryfou presented
their work. The minute madness started shortly after and followed by
the poster session.
Standing
by my poster in the middle of the room I started explaining my work
to interested researchers in the field. After a while I started
checking out other neighboring posters and I bet my friend Clare Llewellyn for
drinks if she won the best poster award (spoiler alert, she owes me
drinks now!) with her brilliant linen cloth poster. Later that
evening and after the welcome reception we went out for dinner and
drinks in another authentic Cypriot restaurant and had a lovely
time.
The
following day started with the second keynote speech by Dr. Andreas Lanitis from the Department of Multimedia and Graphic Arts, Cyprus University of Technology entitled: “On the Preservation of Cultural
Heritage Through Digitization, Restoration, Reproduction and Usage”.
In this captivating talk, Dr. Lanitis discussed the digital
preservation of Cypriot Cultural Heritage artifacts, the restoration
and reproduction.
After
the coffee break I also attended the second track entitled: “Content
and Metadata Quality” where two fascinating papers have been
presented, one regarding the SKOS vocabularies and the other about
meta learning from wiki articles. I was fairly nervous because the
following session and just after lunch I was supposed to present my
long paper too.
During
lunch I had my second phone interview with Ms. Claire Connelly a
journalist from News Ltd in Australia also writing an article about
our work. Following lunch, this time I joined the 1st track sessions
among which I will present my work. It started with Anqi Cui
presenting his interesting work with PrEV (Preserving and Providing Web Pages and User-generated Contents). To my surprise he cited my
work within his presentation and a sense of accomplishment flooded
me. Scientific processes have been analyzed next in the following
paper entitled: “Preserving Scientific Processes from Design to Publications”. After that I took the stage and I was surprised by
the large number of attendees. The questions were marvelous and Cathy Marshall, among others, gave me very precious feedback.
Following my presentation, Ray Larson and Maria Sumbana presented the
following two papers.
After
the coffee break we returned back to have the last round of sessions
in which I chose again track 2 “Information Retrieval” presenting
four more papers. At 7 o'clock we gathered by the lobby to board the
buses taking us to the outskirts of the town to an authentic Cypriot
restaurant. This one was different as it had a band and a folk
dancing group who taught us how to do the Cypriot round and line
group dancing.
The
following morning I packed my bags and checked out before attending
the last day of the conference which started with an enticing and
captivating talk as usual by Cathy Marshall from Microsoft Research San Francisco. The talk was entitled “Whose content is it anyway? Social media, personal data, and the fate of our digital legacy”, similar to the equally wonderful speech she gave at JCDL.
Finally I attended the set of sessions that I have been looking
forward to the most, “track 2 User Behavior” presented by Michael Khoo and Catherine Hall, Sally Jo Cunningham, Fernanto Loizides, and
my friend Gerhard Gossen.
The
closing session followed up next concluding the conference where the
best paper/demo/poster awards were handed to the authors among which
our friend Clare Llewellyn.
In
conclusion it was a really organized and successful conference, our
presence was evident three times and I attended several interesting
sessions, met old colleagues, made a lot of new contacts, and got
really great feedback.
Other Blog Posts:
- Justyna Walkowska: TPDL 2012: Theory and Practice of Digital Libraries
-- Hany SalahEldeen
Labels:
Conference,
Cyprus,
TPDL,
TPDL2012,
trip report
Subscribe to:
Posts (Atom)