Posts

2025-03-27: Establishing a Baseline by Administration for the Takedown of US Government Webpages using Web Archives

Image
Figure 1: A samhsa.gov page identified by the New York Times as removed under the second Trump administration On February 2, 2025, Ethan Singer ( @ethanpsinger ) published an article in the New York Times titled " Thousands of U.S. Government Web Pages Have Been Taken Down Since Friday ." Singer showed that over 8,000 webpages from at least a dozen federal agencies had been removed. The article exposed individual pages as well as entire sections of websites that had been taken down. In the full version of the article available to subscribers, Singer identified four limitations: This study only identified removed pages, not pages that stayed but changed. This study only covered the current administration on the specified day because they used live sitemaps. The live web can't be used to extend the study backwards to previous administrations. Continuing the analysis with this methodology for this administration as pages are restored or further removed requires constant mo...

2025-03-26: A Battle of Opinions: Tools vs. Humans (and Humans vs. Humans) in Sentiment Analysis

Image
  Introduction We analyzed the sentiment of 100 tweets using three sentiment analysis tools ( TextBlob , VADER , and a RoBERTa-base model ) and six human raters. To measure agreement, we calculated Cohen’s Kappa for each pair of raters (including both humans and tools) and Fleiss’ Kappa for multiple raters. The results? Let’s just say consensus was hard to find. Even the human raters struggled to agree, so we took a majority vote among them and compared it with the tools. Notably, the RoBERTa-base model showed the best alignment with human rating. Our dataset consists of 100 tweets collected using the keyword “Site C, Khayelitsha” to study residents' perceptions of safety and security in Khayelitsha Township, South Africa , as part of the Minerva Research Initiative Grant awarded by the U.S. Department of Defense in 2022 . We then collected the sentiment labels by running the sentiment analysis tools listed below and also by gathering data from six human raters. This data is ava...