Request for removal is a Perl script that harvests all Message-IDs of articles from a certain author in the Google Groups service.
---> Skip to the code
From the 1980s to the middle of the 1990s, the Usenet was a very popular, message board like service on the internet. It existed long before the World Wide Web and is still in use today.
Only a few people had access to the internet and could use the Usenet. The community would usually consist of technically-minded, male people, discussing technical but also private topics. Only a few of them imagined that all of their postings would later get archived by the Web-Service Deja News, making them available to a new, gigantic audience during the internet boom of the 1990s.
When Deja News went out of business in 2001, Google purchased this and other archives of Usenet articles and re-released them under their own brand with Google Groups.
The problem was: A large number of discussion groups were used for chatting, local gossip exchange or publishing poetry etc ... Many people signed their messages with their real names or talked freely about persons that did not know about the internet -- and probably never ever would read anything there, right?
Today everybody's grandmother is online; what was a message intended for a small peer group became part of another Google product. Statements made a decade ago in an illusion of safeness, now searchable with Google's comfort, can lead to awkward situations.
Google offers to remove messages from their Google Groups database, but the process of getting one's Message-IDs is cumbersome, especially for people that have been very active on the Usenet.
But times of regret are over now!
Request for removal collects all the Message-IDs from an author, identified by the email address in the posting's From header. It saves a text file with the Message-IDs. These can be copy-pasted for use with Google's removal tool.
The script requires the Perl programming language with the CPAN modules LWP and URI.
Update 2008-11-07: Julian Wiersbitzki made a small update, implementing the following improvements:
#!/usr/bin/perl
# ______ _______ _____ _ _ _______ _______ _______
# |_____/ |______ | __| | | |______ |______ |
# | \_ |______ |____\| |_____| |______ ______| |
#
# _______ _____ ______
# |______ | | |_____/
# | |_____| | \_
#
# ______ _______ _______ _____ _ _ _______
# |_____/ |______ | | | | | \ / |_____| |
# | \_ |______ | | | |_____| \/ | | |_____
# This script asks Google Groups for Messages created by $author and
# prints a list of all found Message-IDs.
# v1.0 released by Dragan Espenschied
# This Software is in the Public Domain.
# v1.1 released by Julian Wiersbitzki
# Changes:
# - Google changed HTML-Code of Message-Body, $message_body customized.
# - Also a file with Google-URLs for messages is created. These URLs can also be used as request for removal.
use strict;
my $author = 'i.am@example.com'; Â Â Â Â Â # author's email address goes here
use LWP::UserAgent;
use URI::Escape;
my %groups_messages; Â Â Â Â Â Â Â Â Â Â Â Â Â Â # this hash will contain all found message IDs
                        # Fake Browser
my $ua = Â LWP::UserAgent->new(agent => 'Mozilla/5.0 (Linux; U; appSysName i686; de; rv:1.7.5) Gecko/20041108
Firefox/1.0');
my $result_page_counter = 0; Â Â Â Â Â Â Â Â Â Â # we are on this serp
my $more = 1; Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â # are new links found or did we
                        # reach the end of the list?
while($more == 1) { Â Â Â Â Â Â Â Â Â Â Â Â Â Â # while new links are found
  print "SERP page: $result_page_counter\n";
  my $request_uri =               # construct uri with serp number
   'http://groups.google.com/groups?q=author%3A'.uri_escape($author).
   '&start='.$result_page_counter.
   '&hl=de&lr=νm=100&filter=0';
  my $response = $ua->get($request_uri);
  unless($response->code == 200) {       # test on HTTP error codes
   $more = 0;
   die "Error! $request_uri\nHTTP-Status: ".$response->code."\n";
  }
  my $google_result_page = $response->content; # get SERP
             # Check on google spyware detection
if($google_result_page =~ /403 Forbidden<\/title>/m) {
$more = 0;
die "Error! Google thinks this is malicious software.\nPlease try again later.\n";
}
# look on the serp for links that contain
# a ref to /groups, catch group name and
# google's hash identifier
my $counter = 0;
while($google_result_page =~ /<a\s+href="\/group\/(\S+)\/browse_thread\/[^"]+#([0-9a-z]+)"/sg) {
unless(exists($groups_messages{$2})) { # this is a yet unknown message
$groups_messages{$2}{group} = $1; # save group name and google hash identifier
print "$2 -- $1\n";
$counter++;
} else { # this message already appeared before
$more = 0; # which means that we should search no more
}
}
print "Found: $counter posts.\n";
if($counter < 100) { # if there are less than 100 new posts on the
$more = 0; # SERP, this is the last page for this query
}
$result_page_counter += 100; # increase serp number
sleep(int(rand(5))); # wait some time not to stress google too much
}
# open file for
my $export_file = open(SAVE, "> message_ids_for_$author.txt") or die "could not save file: $!\n";
my $export_file2 = open(SAVE2, "> message_urls_for_$author.txt") or die "could not save file: $!\n";
# retrieve "source" of all found messages
foreach my $google_hash (keys %groups_messages) {
# uri contains group name and google's hash
# identifier
my $request_uri = 'http://groups.google.com/group/'.$groups_messages{$google_hash}{group}.
'/msg/'.$google_hash.'?dmode=source&hl=de';
my $response = $ua->get($request_uri);
unless($response->code == 200) {
$more = 0;
die "Error! $request_uri\nHTTP-Status: ".$response->code."\n";
}
my $message_body = $response->content;
# Check on google spyware detection
if($message_body =~ /403 Forbidden<\/title>/m) {
$more = 0;
die "Error! Google thinks this is malicious software.\nPlease try again later.\n";
}
# find Message-ID from the header
$message_body =~ /.+Message-ID: <(\S+)>.+<\/pre>/s;
$groups_messages{$google_hash}{msgid} = $1;
print "http://groups.google.de/group/$groups_messages{$google_hash}{group}/msg/$google_hash\n";
# check if message-ID is extracted
if($1 == "") {
# if not display message
print "Message not found, probably already deleted...\n";
} else {
# else print message-ID and message-URL to each files.
print "$1\n";
print SAVE "$1\n";
print SAVE2 "http://groups.google.de/group/$groups_messages{$google_hash}{group}/msg/$google_hash\n";
}
sleep(int(rand(5))); # wait some time not to stress google too much
}
close SAVE;
close SAVE2;
-
I've had the same problem. I've spent a long time searching the code for mistakes, but I can't find any. One solution, however, is to ditch the whole last "foreach" command. Then, replace the
print "$2 -- $1\n";
with
my $export_file = open(SAVE, ">>message_ids.txt") or die "could not save file: $!\n";
print "http://groups.google.com/group/$1/msg/$2?dmode=source&hl=en \n";
print SAVE "http://groups.google.com/group/$1/msg/$2?dmode=source&hl=en \n";
This will give you a file with all the urls you need, which can be used in the removal tool.
reply
When searching by email I get no results, when using my Name get plenty, but:
There is allways an error and nothing is written to the file:
Use of uninitialized value in concatenation (.) or string at test.pl line 116.
Use of uninitialized value in concatenation (.) or string at test.pl line 117.
Any ideas on this matter?
(Using: Windows @ ActivePearl)
reply
As an update, I figured out why it wasn't spitting out the actual message IDs - the script was looking for actual < > brackets, when the source code was using < and > instead.
So instead of:
$message_body =~ /.+Message-ID: <(\S+)>.+<\/pre>/s;
It should be:
$message_body =~ /.+Message-ID: <\;(\S+)>\;.+<\/pre>/s;
When works perfectly in Ubuntu. Thanks again!
reply
Original-Seite: drx: Request for removal
Web-Blaster V2.21

Die angezeigte Seite wurde durch den Web-Blaster geleitet und dadurch mit der Datenbank des Assoziations-Blasters verknpft.
Der Web-Blaster ist ein alternativer Browser, der beliebige
Webseiten mit Links anreichert. Der gesamte Vorgang geschieht in Echtzeit mit den Original-Daten, es werden keine fremden Daten auf dem Blaster-Server zwischengespeichert.
Um diese Seite unverändert und auf ihrem ursprünglichen Server zu sehen, muss der Webblaster abgeschaltet werden.
