Tuesday, May 31, 2016

2016-05-31: Can I find this story? API: Yes, Google: Maybe, Native Search: No

A story on Storify titled: "Lecture on Academic Freedom" (capture date: 2016-05-31)
The story on Storify titled: "Lecture on Academic Freedom" could not be found on Google (capture date: 2016-05-31)
The story on Storify titled: "Lecture on Academic Freedom" could not be found on Storify native search (capture date: 2016-05-31)
A part of our research (funded by IMLS) to build collections for stories or events involves exploring content curation sites like Storify in order to determine if they hold quality (news worthy, timely, etc.) content. Storify is a social network service used to create stories which consists of text and multimedia content, as well as content from other social media sites like Twitter, Facebook and Instagram.
Our exploration involved collecting stories from Storify over a period in other to manually inspect the stories to determine their newsworthiness. This exploration was dual natured: we collected latest stories (across multiple topics) from the Storify API (browse/latest interface) over a period of time, we also collected stories from Storify about the Ebola virus through Storify's search API. During this period we collected resources from Google (with the "site:storify.com" directive) as well. At a particular point in our exploration, we considered if we could rely exclusively on Storify search as a means to find content or use Google's site directive to find Storify stories. In other words, how good is the Storify native search compared to Google search for discovery of stories on Storify when compared to the Storify browse/latest API? 
Storify API vs Google and Storify native search: A simple plan for measuring discovery
We focused on known item searches to avoid the problem of subjective relevance measures. This gave us a very simple way of scoring Google and Storify's native search: if Google finds a specific story (query extracted from exact title, body content and description), Google gets 1 point. On the other hand, if Storify's native search (using the same query), finds the story, Storify gets 1 point.
Our set of test stories and their corresponding queries generated from the story titles, body content and description snippets consisted of 10 stories created between February 2016 and March 2016 (Enough time for both search services to index the stories). These stories were collected from the Storify browse/latest API interface which allows for discovery of content, but does not allow us to find topical content like with search. Here is the list of stories (collected 2016-05-30) and their respective creation datetime values, as well as the results outlining stories found by Google and/or Storify's native search:

Story Creation datetime Found? (Google) Found? (Storify)
Commandos 2: Men of Courage full game free pc, download, play. download Commandos 2: Men of Courage for pc 2016-02-22T22:36:03 Yes No
#SJUtakeover 2016-02-17T21:16:43 Yes No
Annotations for Edgar Allan Poe 2016-03-02T19:47:31 No No
Lecture on Academic Freedom 2016-02-22T22:27:08 No No
Hitman: Codename 47 full game free pc, download, play. download Hitman: Codename 47 for pc 2016-02-22T22:36:26 Yes No
AU Game Lab at GDC 2016 2016-03-18T17:36:34 Yes No
5 Leading Onlinegames For Females Cost Free 2016-02-22T22:37:22 Yes No
Sony Ericsson Z610i (Pink): newest cellular Phone With Advanced attributes 2016-03-18T23:50:55 No No
Senior Research Paper 2016-02-26T19:47:19 Yes No
Syracuse community reacts to NCAA Tournament win over Dayton 2016-03-18T17:38:34 Yes No

We searched for the stories by issuing queries with full quotes (for exact match) to Google search (with the "site:storify.com" directive) and Storify's native search and counted the number of hits and misses for both. For both Google and Storify, all SERP links where included in the test. The results from Google did not exceed 1 page, for Storify however, the average number was 20 stories.
Storify's native search finds 0/10 stories, Google finds 7/10
We expected Storify to find more stories compared to Google, since the content resides on Storify, but this was not the case: out of 10 stories, Google found 7 but Storify found none! Google found all except the following stories:
  1. Annotations for Edgar Allan Poe
  2. Lecture on Academic Freedom
  3. Sony Ericsson Z610i (Pink): newest cellular Phone With Advanced attributes
A story on Storify titled: "#SJUTakeover" (capture date: 2016-05-31)

The story on Storify titled: "#SJUTakeover" could not be found on Storify search but found on Google (capture date: 2016-05-31)
Before our test, we checked and did not find a Storify utility to exclude a story from search during the story's creation. Consequently, out test result suggests that the Storify search index is not synchronized with its browse/latest API interface. This investigation also shows the utility of using the Storify API for discovery, which contradicts some of our previous experiences where APIs provide different, limited, or stale data (e.g., Delicious API, SE APIs).
A proposal for a comprehensive study
We acknowledge the sample size of our experiment is very small, however, the preliminary results could be an approximation of a larger study due to random selection of stories. But the curious reader may consider verifying our result through a larger test consisting of a large collection of random stories published across a wide temporal window. If this is done, kindly share your findings with us.
--Nwala