Web Robot View of http://authors.aspalliance.com/brettb/HTTPWithPerlScriptAndASP.asp

Page Item Value
Title Using the HTTP Protocol With PerlScript and ASP
Description An example of how to use the PerlScript scripting language with ASP
Keywords perl, perlscript, asp, http, vbscript
Robots Meta Tag  
Page Content   ASP Kitchen
Search: Go Home | ASP Articles | ASP.NET Articles | Tools | Table Of Contents | What's New

ASP Kitchen : ASPWatch.com articles : Using the HTTP Protocol With PerlScript and ASP

Using the HTTP Protocol With PerlScript and ASP Introduction One topic often discussed by ASP programmers is how to access content from other servers using protocols such as HTTP. There are many uses of such procedures, such as ensuring a user entering details into a web form enters a valid URL, or for pulling stock quotes from one site and publishing them via another.

Handy Tip! If you prefer to write your web applications in Perl then you will be pleased to know that Perl will be available as one of the many programming languages in the ASP.NET Framework.

...More Information There are several approaches to obtaining content from other servers, and in particular using the HTTP protocol to programmatically access one web page from within another. ASP developers using VBScript or JScript might like to take a look at this article , which describes using an ActiveX object to achieve this. Alternatively the AspHTTP component from ServerObjects Inc. is popular with developers.

An alternative approach is to use the PerlScript ActiveX scripting engine. This allows developers to write ASP documents in Perl, rather than the traditional VBScript or JScript. Like VBScript and JScript, Perl is an interpreted language, and is relatively easy to learn. It has long been the language of choice for many web developers, and due to the long association of Perl with the Internet, it is also unsurprising to find that it offers excellent support for the development of Internet applications. Perl is also a good choice when writing a script to extracting and parsing content from other servers due to its superior text handling capabilities.

Using PerlScript If you want to write an ASP document in PerlScript, then you may want to add the following as the first line of your document:

%@ LANGUAGE= PerlScript %

All the code added to this page between the % % tags will then be interpreted as PerlScript instead of the server s default scripting language (which is usually VBScript).

Although you can, in theory, mix VBScript, JScript and PerlScript within the same document, this will lead to decreased server performance when compared to using a single scripting engine. More importantly, you run the risk of your ASP document outputting content from the various scripting engines in a different order to that which you might have intended.

One further warning is that there will likely be all kinds of security risks from letting your web pages take input from other web pages. You should, therefore, use this sample code with care, or perhaps restrict its use to an Intranet environment rather than on a publicly accessible Internet site. Don t forget as well that extracting content from third party web services could bring you into legal difficulties unless you have explicit permission to do so!

Anyway, onto the code samples. The first is a function called CheckURL that will determine whether a specified URL exists. The script uses the libwww Perl library, a collection of modules that can be used to programmatically access the web.

%
sub CheckURL {
# Subroutine to check that a URL exists
# Use the first argument of the function as the URL to check
$url_to_check = $_[0];

# Use the libwww Perl library
use LWP::UserAgent;

# Create a new instance of a libwww UserAgent in order to send HTTP requests
$ua = new LWP::UserAgent;

# Set the HTTP_USER_AGENT HTTP header for the request
$ua- agent( Mozilla/4.0 (compatible; MSIE 4.0; Windows NT) );

# Set a timeout for the HTTP request (in seconds)
$ua- timeout(3);

# Set a maximum size for the HTTP request (in bytes)
$ua- max_size(8192);

#Initialise the HTTP request
$request = new HTTP::Request 'GET' = $url_to_check;

# Set the UserAgent to receive HTML
$request- header('Accept' = 'text/html');

# Send the HTTP request
$result = $ua- request($request);

# Check the outcome of the HTTP request
if ($result- is_success) {
$url_status = $url_to_check was detected ;
} else {
$url_status = $url_to_check was not detected ;
}

# Return a string with the status of the request
return $url_status;

}
%

This function can then be called using the following PerlScript (changing the required URL as appropriate):

%
$Response- Write(CheckURL( http://www.brettb.com/ ));
%

Extending the script

PerlScript offers a wealth of ways for extending the basic script shown above. For example, using the following as the last line of the CheckURL function will cause the script to return the actual HTML from the HTTP request:

return $result- content;

This is useful if you want to parse the HTML in order to extract portions of it.

Alternatively, if you are interested in the precise error message returned from a server, then the following code will be useful:

return $result- error_as_HTML;

If a URL is not found, then the function will return the following:

An Error Occurred
404 Object Not Found

Writing a link extractor The following code demonstrates how PerlScript can be used to extract all of the hyperlinks from a document requested using HTTP. There are two functions: ExtractLinks and LinkCollector. ExtractLinks is the main function. LinkCollector is called from ExtractLinks, and is used to gather the requested document s hyperlinks into a list. The two functions are shown below:

sub ExtractLinks{

# Subroutine to check that a URL exists
# Use the first argument of the function as the URL to extract links from
$url_to_check = $_[0];

# Use the libwww Perl library
use LWP::UserAgent;

# Use the link extracting HTML parser
use HTML::LinkExtor;

# The URL module is used here to expand URLs by including their base reference
use URI::URL;

# Create a list that will be used to contain details of the links within the document
@LinksList= ();

# Create a new instance of a libwww UserAgent in order to send HTTP requests
$ua = new LWP::UserAgent;

# Set the HTTP_USER_AGENT HTTP header for the request
$ua- agent( Mozilla/4.0 (compatible; MSIE 4.0; Windows NT) );

# Set a timeout for the HTTP request (in seconds)
$ua- timeout(3);

# Set a maximum size for the HTTP request (in bytes)
$ua- max_size(8192);

# Create an instance of the link extracting HTML parser
$parser = HTML::LinkExtor- new(\ LinkCollector);

#Initialise the HTTP request
$result = $ua- request(HTTP::Request- new(GET = $url_to_check),
sub {$parser- parse($_[0])});

# Expand URLs to include the base reference
$base = $result- base;
@LinksList = map { $_ = url($_, $base)- abs; } @LinksList;

# Check the outcome of the HTTP request
# If successful, then return a list of links in the requested document
# otherwise, return an error message
if ($result- is_success) {

for (@LinksList) {
$LinksList = $LinksList . $_ br ;
}

return $LinksList ;

} else {
return $url_to_check was not detected ;
}

}

# A short subroutine to collect the links into a list
sub LinkCollector {

($tag, %attr) = @_;
push(@LinksList, values %attr);

}
%

The ExtractLinks subroutine can then be called using something like:

%
$Response- Write(ExtractLinks( http://www.brettb.com/ ));
%

Further reading If you want to install ActivePerl on your web server, then download it (free of charge) from the ActiveState website . The installation routine creates an extensive library of documentation, including reference guides to the Perl modules and functions described in this article.

There are plenty of online resources for learning Perl, with http://www.perl.com and http://www.perl.org being two of the best starting points. There is also a good introductory article about using Perl with ASP on ASPToday, as well as one on Web Techniques .

You might also like to invest in one of these featured books:

Useful Development Tools ASP Documentation Tool Automatically creates developer documentation for ASP 2.0 and 3.0 web applications written in VBScript and JScript. Documentation for Microsoft Access, SQL Server 7/2000 databases and Visual Basic 6.0 components associated with the web application can also be incorporated into the reports. Documentation is created in HTML, HTML Help and plain text formats. View Sample Output (HTML Help format).
View Sample Output (HTML Format).
Download Trial Version (5.2Mb ZIP file).
Index Server Companion The Index Server Companion is a Windows application that extends the functionality of Microsoft Index Server so that it is able to index content from remote websites and also from ODBC databases. As such it can be used as a low cost alternative to Site Server 3.0 Search. View Product Documentation (119K ZIP file).
Try Sample Search Facility .
Download Trial Version (1.7Mb ZIP file).
ASP.NET Documentation Tool Automatically creates developer documentation for ASP.NET web applications written in C# or VB.NET. Documentation for SQL Server 7/2000 databases and C#/VB.NET components associated with the web application can also be incorporated into the reports. Documentation is created in HTML, HTML Help and plain text formats. View Sample Output (HTML Help format).
View Sample Output (HTML Format).
Download Trial Version (727K ZIP file).
SQL Documentation Tool The SQL Documentation Tool creates technical documentation for Microsoft SQL Server 7.0 and 2000 databases. Technical documentation is created in HTML and HTML Help formats. The HTML Help format documentation is fully searchable and cross referenced. The SQL Documentation Tool documents SQL Server Tables, Views, Stored Procedures, Triggers and Table Relationships. View Sample Output (HTML Help format).
View Sample Output (HTML Format).
Download Trial Version (10.3Mb ZIP file).
The Website Utility The Website Utility examines websites for errors and areas that need to be optimised for search engines by using a built in web crawling engine. Errors checked for include broken or moved hyperlinks, missing page titles and missing meta tags. It also generates HTML for use in creating website site maps (table of contents pages - like this one ), and is able to create both client-side JavaScript Search Engines and server-side ASP Search Engines for a website. View Sample Output (HTML Format).
Download Trial Version (3Mb ZIP file).
Text Workbench Text Workbench is a file search and replacement utility for text files and Microsoft Office documents. Make rapid file replacements on multiple files and folders full of files. Advanced replacement options include regular expressions support. It even works on remote file systems via FTP. A Regular Expression Laboratory allows advanced pattern matching and replacement expressions to be built and tested. This great utility will make your everyday development tasks much easier! Download Trial Version (3Mb ZIP file; you have the option to either install directly from this link or save the file for later installation). Author details Brett Burridge spent two years working in the University of Essex Computing Service, before moving to The Internet Applications Group in the Autumn of 1999, where he developed e-Business applications for a range of corporate clients and dot-com start ups. Brett is presently employed as an Internet developer and technical writer through his own company, Winnersh Triangle Web Solutions Limited . The company produces a number of innovative products, including the popular ASP Documentation Tool , the Index Server Companion , the ASP.NET Documentation Tool , the SQL Server Documentation Tool and The Website Utility . The company is also available for web application design and development at reasonable rates, primarily using Microsoft technologies (ASP, ASP.NET, Visual Basic, SQL Server) but also using open source technologies such as PHP, MySQL and Perl. Specialist services include development of search solutions using Microsoft's Index Server and Site Server 3.0 Search. As well as the ASPAlliance, Brett has written articles for Ariadne.ac.uk and ASPToday , and has contributed recipes to the ASP.NET Developer's Cookbook . links Outside web development, Brett is interested in digital photography (here's my photo gallery ), tropical fishkeeping and collecting contemporary works of art by artists such as Doug Hyde .

Article history Using the HTTP protocol with PerlScript and ASP originally published on ASPWatch.com on April 26 2000. Republished on ASPAlliance.com on 1 October 2001.

ASP Kitchen : ASPWatch.com articles : Using the HTTP Protocol With PerlScript and ASP

page content copyright Brett Burridge 1998 - 2004.
Image Alt Tags ASPAlliance
Information
ActivePerl with ASP and ADO
Learning Perl (2nd Edition)
Effective Perl Programming: Writing Better Programs With Perl
View Sample Output (HTML Help format)
View Sample Output (HTML Format)
Download Trial Version
View Product Documentation
Try Sample Search Facility
Download Trial Version
View Sample Output (HTML Help format)
View Sample Output (HTML Format)
Download Trial Version
View Sample Output (HTML Help format)
View Sample Output (HTML Format)
Download Trial Version
View Sample Output (HTML Format)
Download Trial Version
Download Trial Version of Text Workbench
Index Server Companion - allows Index Server to index content from remote websites and ODBC databases!!!
Download a Free ASP Documentation Tool Now!
Google
Search Engine Builder - Build a search engine for your website!
Internal Links http://authors.aspalliance.com/brettb/Default.asp (3 links in this page) [ Robot View of this URL ]
http://authors.aspalliance.com/brettb/ASPWatchArticles.asp (2 links in this page) [ Robot View of this URL ]
http://authors.aspalliance.com/brettb/TableOfContents.asp (2 links in this page) [ Robot View of this URL ]
http://authors.aspalliance.com/brettb/ASP.NetArticles.aspx [ Robot View of this URL ]
http://authors.aspalliance.com/brettb/Tools.asp [ Robot View of this URL ]
http://authors.aspalliance.com/brettb/What'sNew.aspx [ Robot View of this URL ]
http://authors.aspalliance.com/brettb/ClassicASPArticles.asp [ Robot View of this URL ]
http://authors.aspalliance.com/brettb/Links.asp [ Robot View of this URL ]
http://authors.aspalliance.com/ [ Robot View of this URL ]

Reporting Main Page

Report generated by The Website Utility 2.8