[an error occurred while processing this directive]

Experimental and non working (broken) Perl code to add "noarchive" to an HTML file. [] (Add meta-noarchive to HTML-page with Perl), report, page 722085
https://www.purl.org/stefan_ram/pub/add-noarchive-to-html-with-perl (canonical URI).
Stefan Ram

Add "noarchive" to all HTML -files in a directory

This is unfinished experimental code.

Try and use completely on your own risk.

For example, when I have the following file "source.html" in the directory "C:\noarchive", the program generates the file "source.html.tmp". Removing two comment characters in the Perl source code, the original file could be replaced. The program treats all files in the directory and all subdirectories with names ending in "source.html".

source.html
<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE html  
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 
<head> 
<title>Virtual Library</title> 
</head> 
<body> 
<p>Moved to <a href="http://example.org/">example.org</a>.</p> 
</body> 
</html>

source.html.tmp
<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE html  
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 
<head> 
<title>Virtual Library</title> 
<meta name="robots" content="noarchive" /> 
</head> 
<body> 
<p>Moved to <a href="http://example.org/">example.org</a>.</p> 
</body> 
</html>

When run on the output of the previous phase, it generates the following output. The "noarchive" is not added twice.

source.html.tmp (1)
<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE html  
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 
<head> 
<title>Virtual Library</title> 
<meta name="robots" content="noarchive" /> 
</head> 
<body> 
<p>Moved to <a href="http://example.org/">example.org</a>.</p> 
</body> 
</html>

Here is a source file, that already has meta-elements

source.html (2)
<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE html  
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 
<head> 
<title>Virtual Library</title> 
<meta name="alpha" content="beta" /> 
<meta name="robots" content="index, follow" /> 
<meta name="gamma" content="delta" /> 
</head> 
<body> 
<p>Moved to <a href="http://example.org/">example.org</a>.</p> 
</body> 
</html>

And the output is:

source.html.tmp (2)
<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE html  
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 
<head> 
<title>Virtual Library</title> 
<meta name="alpha" content="beta" /> 
<meta name="robots" content="index, follow, noarchive" /> 
<meta name="gamma" content="delta" /> 
</head> 
<body> 
<p>Moved to <a href="http://example.org/">example.org</a>.</p> 
</body> 
</html>

However, beware! This program has not been extensively tested or used and probably still has many bugs!

edit.pl

# Copyright 2006 Stefan Ram. 

# This is unfinished experimental code that still needs to be 
# reviewed, tested, and edited before used for any purpose. 

# This program is free software; you can redistribute it and/or modify 
# it under the terms of the GNU General Public License as published by 
# the Free Software Foundation; either version 2 of the License, or 
# (at your option) any later version. 

# This program is distributed in the hope that it will be useful, 
# but WITHOUT ANY WARRANTY; without even the implied warranty of 
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 
# GNU General Public License for more details. 

# You should have received a copy of the GNU General Public License 
# along with this program; if not, write to the Free Software 
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 

 
 
use strict; 
use warnings; 
 
use IO::File; 
 
sub assertrobots($) 
{ if( $_[ 0 ] !~ /<\s*meta[^>]+name\s*=\s*["']?robots/ ) 
{ $_[ 0 ] =~ s{</head>}{<meta name="robots" content="" />\n</head>}; } 
print $_[ 0 ], "\n"; } 
 
sub addnoarchive_($) 
{  
print "addnoarchive_ called( \"", $_[ 0 ], "\" )\n"; 
for my $q ( '"', "'" ) 

$_[ 0 ] =~ s{ 
content\s*=\s*$q([^$q]*)$q 
}{ 
my( $values )=( $1 ); 
print "addnoarchive_ matched( $values )", "\n"; 
$values =~ s/^ //g; $values =~ s/ $//g; 
my $text = $values; 
if( length $text ) 
{ if( $text !~ /noarchive/ ) 
{ $text .= ', noarchive'; }} 
else{ $text = 'noarchive'; } 
"content=\"$text\"" 
}egisx; 
}  
 

 
# todo: this must clear the attribute first 
sub addnoarchive($) 
{  
for my $q ( '"', "'" ) 

$_[ 0 ] =~ s{ 
<meta([^>]+)name\s*=\s*${q}robots${q}?([^>]*)> 
}{ 
print STDERR "addnoarchive matched", "\n"; 
my( $left, $right )=( $1, $2 ); 
addnoarchive_($left); 
addnoarchive_($right); 
$left =~ s/^ //g; $left =~ s/ $//g; 
$right =~ s/^ //g; $right =~ s/ $//g; 
my $text = $left; 
if( length $text ){ $text .= ' ' . $right; } 
else{ $text = $right; } 
"<meta name=\"robots\" $text>" 
}egisx; 

print $_[ 0 ]; 

 
sub noarchive($$) 
{ my( $text, $status )= @_; 
print "noachive", $status, "\n"; 
if( $status ) 
{ if( $status > 0 ){ assertrobots( $_[ 0 ]); addnoarchive( $_[ 0 ]); } 
if( $status < 0 ){ removenoarchive( $_[ 0 ]); hiderobots( $_[ 0 ]); }} 
return 1; } 
 
sub modify($) 
{ return noarchive( $_[ 0 ], 1 ); } 
 
sub srp($$) 
{ my( $source, $target )= @_; 
my ${kv_in} = IO::File->new(); 
{ if( open( $kv_in, "<$source" )) 
{ my $infile; undef $/;$infile = <$kv_in>; 
if( close( $kv_in )) 
{ { my $directive; 
if( modify( $infile )) 
{ my ${kv_out} = IO::File->new(); 
if( open( $kv_out, ">$target" )) 
{ print $kv_out $infile; 
if(close( $kv_out )) 
{ # unlink( $source ); 
# rename( $target, $source ); 
print $source; }}}}}}}} 
 
sub file($) 
{ my( $path )= @_; 
if( $path =~ /source\.html$/ ) 
{ print "srp " . $path . "\n"; 
srp( $path, $path . ".tmp" ); 
print $path . "\n"; }} 
 
sub dodir($) 
{ my( $entries )= @_; 
foreach( @$entries ) 
{ if( -f ) # file 
{ file( $_ ); } 
if( -d ) # directory 
{ directory( $_ ); }}} 
 
sub directory($) 
{ my( $path )= @_; 
if( opendir( DIR, $path )) 
{ my @entries = map{"$path/$_"} grep !/^(\.|\.\.)$/, readdir DIR; 
dodir( \@entries ); 
closedir DIR; }} 
 
sub start 
{ my $path = shift @ARGV if @ARGV;  
$path ||= 'C:\\noarchive'; 
directory( $path ); return 0; } 
 
start; 
 
 
# $_[0] =~ s/example/example/g; 
 
############################################################################## 
 
=head2 Name 
 
assertrobots - assert a meta-robots element  
 
=head2 Usage 
 
assertrobots( $html_document ); 
# $html_document an HTML document 
 
=head2 Description 
 
This sub asserts that an HTML document contains a robot-meta-Element: 
If none is found, it is created and inserted at the end of the 
head element. 
 
This sub uses heuristics, it does not parse the HTML properly, so 
it will fail under some conditions. 
 
=head2 Parameters 
 
=over 4 
 
=item $html_document 
 
A html_document text, which might be modified by the sub invocation. 
 
=back 
 
=head2 Result 
 
(none) 
 
=cut

About this page, Impressum  |   Form for messages to the publisher regarding this page  |   "ram@zedat.fu-berlin.de" (without the quotation marks) is the email-address of Stefan Ram.   |   Beginning at the start page often more information about the topics of this page can be found. (A link to the start page appears at the very top of this page.)  |   Copyright 2004 Stefan Ram, Berlin. All rights reserved. This page is a publication by Stefan Ram. slrprd, PbclevtugFgrsnaEnz