Wednesday, June 20, 2007

INTERNET, INTRANET AND EXTRANET APPLICATIONS

Introduction
The Internet is the worldwide, publicly accessible network of interconnected computer networks that transmit data by packet switching using the standard Internet Protocol (IP). It is a "network of networks" that consists of millions of smaller domestic, academic, business, and government networks, which together carry various information and services, such as electronic mail, online chat, file transfer, and the interlinked Web pages and other documents of the World Wide Web.
The Internet and web hold tremendous potential for whole array of activities including online distance education, global digital library, e-commerce, Internet telephony, electronic publishing, electronic journals, virtual museums, etc. It has particularly established itself as a powerful media for self-education for people in isolated or remote areas for its ease-to-use, familiarity with masses, availability of tools and wider accessibility. The web has become the most successful networked multimedia hyper-text-based system of our time. HTML, the de facto language of the web, is extremely simple yet powerful to use. Further, the static HTML web pages can be transformed into vibrant, dynamic and interactive web creations using ever evolving web technologies like CGI Script, Perl, Java, Javascript, ASP, DHTML, XML and open database connectivity (ODBC) drivers.

The rapid growth in the web technology and its ever increasing usage has given librarians and educators with unprecedented opportunities to provide information to the students not within the four-walls of libraries and class rooms but also in the comforts of their home all over the globe. The changes, mainly driven by the new technological innovations and the new learning environment, has presented a scenario where student have access to a vast array or information in many fields from experts all over the world.

The Internet allows us to share information and resources such as government documents, electronic journals, electronic books, media publishing, human anatomical images, computer software, bibliographic and full-text databases, speeches, live concerts, audio and video clippings. The dynamic nature of Internet is derived from scientist, researchers and general public contributing their time, resources and energies to each other. Typical users consult electronic resources at near and distant libraries, download computer shareware and software upgrades, read and print publications, make travel arrangements and purchase goods and services. Electronic mail and news groups assist users to communicate with each other on topics of mutual interest. The discussion forum and listserv provides a platform to people with common interest to engage in thoughtful discussions. A few popular usage of Internet are as follows:

 Retrieving information from reference sources like dictionaries, encyclopaedias, etc. required for day-to-day work from reference sources available on the web;
 Retrieving information from databases of various libraries like the Library of Congress, British Library, Indian Institute of Science, IITs and several other libraries;
 Searching commercial and non-commercial databases like MEDLINE, INSPEC, COMPENDEX, etc.;
 Accessing electronic books, e-journals and other e-documents required for research work from the web sites of commercial and non-commercial publishers;
 Referring social and economic statistical data, such as census information, daily exchange rates, and government budgets and reports;
 Getting documents on fine arts and music, including digital images of art video and audio;
 Exchanging messages with people across the world;
 Searching for computer shareware freeware, and commercial software;
 Sending or receiving sound, animation and picture files across the Internet;
 Setting up temporary or permanent discussions or work-oriented groups;
 Distributing or reading electronic newsletters, newspapers, bulletins and similar publications, products and services;
 Trading with people or other organizations and other e-commerce activities; and
 Chatting with people using software like Yahoo Messengers, Hotmail Messengers, etc.

The explosive growth of the Internet and the World Wide Web in recent years has its impact on the information profession too. It has registered a sea change in the information seeking approach as well as the mode of dissemination of information. As librarians and information professionals, our prime responsibility is to acquire, organise, preserve, retrieve, and disseminate pertinent information to our clientele. This global forum, an emerging medium of communication, and a proven and concrete technology in sharing and exchanging information, has a lot to offer to the information professionals.

The Internet works on client-server technology, i.e. it works on two types of computer programs, i.e. servers and clients. Servers are programs that host resources to serve the clients and clients are programs that users use to access these resources. E-mail, listserv / mail lists, Usenet / newsgroups, FTP, Telnet, Gopher, Archie, WWW, etc., are among the prominent services of the Internet. Each type of service in the Internet has its own client. For example, to access the WWW, we need to use a Web client such as “Netscape” or “Internet Explorer”. The Internet is a network of networks that connects thousand of networks all over the world. Different types of computers on these networks are made to work seamlessly using TCP/IP (Transmission Control Protocol / Internet Protocol). The TCP / IP protocol, in turn, is the common name for a collection of more than 100 protocols used to connect computers and networks.

2. World Wide Web

The World Wide Web, know as WWW, W3 or simply, the Web, is one of the several Internet resource developed to help people publish, organize and provide access to information on the Internet. The Web was first developed by Tim Berners Lee in 1989 while working at CERN, European Particle Physics Laboratory in Switzerland, and has since become the most powerful, and popular, resource discovery tool on the Internet. The WWW can be defined as a hypertext, multimedia, distributed information system that provides links to hypertext documents, as well as to many other Internet tools and databases. Contrary to some common usage, the Internet and the World Wide Web are not synonymous: the Internet is a collection of interconnected computer networks, linked by copper wires, fiber-optic cables, wireless connections, etc.; the Web is a collection of interconnected documents, linked by hyperlinks and URLs. The World Wide Web is accessible via the Internet, as are many other services including e-mail, file sharing,, etc..
2.1. Importance of the Web

The World Wide Web (WWW) is important for libraries because it provides an extremely powerful method of organizing and providing access to information. The web can provide a single interface to a large variety of information resources and systems including textual (unformatted or formatted) documents, images, sound and video files. The web can be used to provide interface to other Internet services like TELNET, FTP and Gopher. It can also be interfaced to online databases. There are several features unique to the Web that makes it the most advanced hypertext-based information system on the Internet. These features are:

2.11. The Web is a Hypertext System: Web is hypertext system, in contrast to the hierarchical menu system used by earlier Internal tools such as Gopher, the user on the web moves from one document to another related documents through embedded links (called hyper links), such hyperlinked words or phrases, when clicked, calls for another document on that topic. Instead of moving from menu to menu, as in Gopher, users of the Web can jump directly from document to document by clinking on hypertext links.

2.12. The Web is a Multimedia System: The web is the most successful networked multimedia hypertext-based system of our times. The web-technology allows incorporation of various media types besides structured text. A good multimedia interactive document is a product consisting of structured text, video clips, animation, pictures, graphics, diagrams, programs, sound, etc. With the advent of graphical browsers, the Web has become a multimedia system, combining different types of media into one document. Before graphical Web browsers (e.g. Netscape, Internet explorer), most of the information available on the Internet was in the form of simple text devoid of any elements common to the printed page, such as text in bold and italics, pictures and other graphical contents. The Web documents may contain:

 Normal text
 Features such as large fonts, bold, italics, indents
 Images such pictures, graphics, logos, illustrations
 Audio content such as sounds, music, commentary, voice messages
 Video content such as movie clips, animations, or computer generated simulations.

2.13. The Web is a Distributed System: The Web is a distributed system for delivering linked documents over the Internet. It is called a distributed system because information can reside on different computers around the world, yet can easily be linked together using hypertext. The Web uses hypertext to create links from one resource to another. From the perspective of users, one set of related documents may appear to reside in one location, but in reality, the successive pages they read may have been requested from anywhere in the world.

2.14. The Web Incorporates other Internet Tools: The Web incorporated the capabilities of most of the earlier tools, and added the ability to handle various media types. The Web can provide links to other types of Internet tools, such as WAIS, Gopher, FTP and TELNET. A Web page can provide links to other relevant information resources on the network, regardless of whether that information is available on a gopher, through TELNET, or at an FTP site. In this way, the Web and its browsers become a method to seamlessly provide access to information available through many different Internet tools.

2.15. The Web Provides an Interface to other Database Systems: A particularly powerful features of Web is that it can act as an interface to database systems connected to the Internet. There are three elements that are needed to create this interface.

i) Forms: Forms are used to collect information through web browser. Forms are method of creating input boxes on a Web page into which users can type information, or select among alternatives.

ii) Database System: RDBMS system such as MS Access, MS SQL, MySQL, Oracle or PostGres can be used as back-end database.

iii) Control Gateway Interface (CGI): The CGI sits between the Web browser and database. It takes the information gathered from the Web browser and passes it to the database. Once the request is processed, the CGI passes the result back to the Web browser in a format that it can display.

Computer and communication technology with its capabilities of parallel processing, multitasking, parallel consultation and parallel knowledge navigation, put together, creates a semblance of artificial intelligence and interactively necessary for developing an interactive learning interface. Coincided with availability of software, hardware and networking technology, the advent of world wide web (WWW), its ever increasing usage and highly evolved browsers has paved the way for creation of a global digital library. The increasing popularity of Internet and developments in web technologies act as catalyst to the development of highly interactive library services.

3. How Does the Web Work?

The most important concepts and underlying mechanism that makes the Web work are client-server architecture, the Hypertext Transfer Protocol (HTTP), Hypertext Markup Language (HTML) and Universal Resource Locators (URLs). These concepts are described below:

3.1. Client-Server Architecture

The Client-Server Architecture is based on the principle where “client” program installed on the user’s computer (called client) communicates with the “server” program installed on the host computer to exchange information through the network. The client-server model involves two separate but related programs, i.e. client and server. The client program is loaded on the PCs of users hooked to the Internet where as the server program is loaded on to the “host” (usually a PC with large storage capacity and RAM, a mini-computer or a main-frame computer) that may be located at a remote place. The concept of client / server computing has particular importance on the Internet because most of the programs are built using this design. A server is a program that “serves” (or delivers) something, usually information, to a client program. A server usually runs on a computer that is connected to network. The client server architecture is discussed in detail in Block 4, Unit 12.

3.2. Hypertext Transfer Protocol (HTTP)

The Hypertext Transfer Protocol (HTTP) is the set of rules for exchanging files (text, graphic images, sound, video, and other multimedia files) on the World Wide Web. As its name implies, essential concepts of HTTP is the idea that files can contain links or references to other files whose selection would lead to transfer of requests from one file to another. Any Web server machine contains, in addition to the HTML and other files it can serve, an HTTP daemon, a program that is designed to wait for HTTP requests and handle them when they arrive. The web browser is an HTTP client, sending requests to server machines. When a user requests for a file through browser by either "opening" a Web file (typing in a Uniform Resource Locator) or clicking on a hypertext link, the browser builds an HTTP request and sends it to the Internet Protocol address indicated by the URL. The HTTP daemon in the destination server machine receives the request and, after any necessary processing, the requested file is returned. The HTTP protocol is discussed in detail in Block 4, Unit 12.

3.3. Hypertext Links: Uniform Resource Locators (URL)

Hypertext links are words, phrases, symbols, maps or any other item in a Web document that are linked to a different place in the same document or to another Internet resource. Hypertext links may be underlined, highlighted in colour, or appear as icons, to distinguish them from the surrounding text. The link must be “selected” by clicking on it with a mouse so as to call upon another document or part of a document. The hypertext links embeds a URL into an object (such as text or an image). The URL is a compact string representation for a resource available on the Internet. Links are based on a standard called a Uniform Resource Locator (URL). URLs contain all of the information needed for the client to find and retrieve a HTML document. An example of a link to the URL is:

HTML Tag for Linking
Hypertext Link
IIT Delhi HTML Tag (end)
Internet Address
Protocol (http)


The link shown above has four parts:

i) The protocol used to connect to the remote server. In this example, the protocol is HTTP, the protocol is used to connect to Web servers. The protocol could also be gopher, FTP or TELNET, indicating that the link is to one of these Internet tools;

ii) The Internet address of the server where the document resides. In this case, the address is http://www.iitd.ac.in;

iii) The directory on the server where the document is located, called the document path. In this case, the path is /acad/; and
iv) The filename of the document itself. In the example, it is index.html (default file) where the html extension indicates that the document is marked up with HTML tags.

 Uniform Resource Locator (URL)

The World Wide Web uses Uniform Resource Locators (URLs) to specify the location of files. A URL includes:

 the protocol being used (e.g., ftp, gopher, etc.), for example: http://
 the host name, for example: www.iitd.ac.in
 the port number (generally omitted, unless otherwise specified)
 and the directories and file name, for example: /acad/library/index.html

The URL would look like this:

http://www.iitd.ac.in/acad/library/index.html

HTTP stands for HyperText Transfer Protocol and is the protocol that the World Wide Web servers use to send HTML documents over the Internet.

4. Web Servers

The Internet works on client / server model. A server is a computer system that is accessed by other computers and / or workstations at remote locations. Usually, a server contains data, datasets, databases and programs. The server computer is also called “host” since it is configured to host datasets, files and databases, receive requests for it from the client machine and serve it. The term “host” means any computer that has full two-way access to other computers on the Internet. All computers that host web sites are host computers or servers since they “host” information and “serve” client machines. Clients and servers are two ends of the Web, each with its own supporting software. A Web server is a software application that uses the Hyper Text Transfer Protocol. There are many Web server software applications, including public domain software from Apache, and commercial applications from Microsoft, Oracle, Netscape and others. A Web server may host or provide access to content and responds to requests received from Web browsers. Every Web server has an IP address and usually a domain name, eg. www.iitd.ac.in. Server software runs exclusively on server machines, handling the storage and transmission of documents. In contrast, client software such as Netscape, Internet Explorer, etc. runs on the end-user’s computer accessing, translating and displaying documents.

Web servers process HTML documents for viewing by Web browsers. The server enables users on other sites to access documents that it hosts. Web servers can run from any hardware platform. There are servers that are specifically designed for Macintosh computers, PCs, Silicon Graphics, and various other platforms. The most important software is the Web server itself. Just like a web server can run on a number of hardware platforms, it can also run under several operating systems, including MS Windows, Windows NT, Unix, Linux and Macintosh.

A Web server is responsible for document storage and retrieval. It sends the document requested (or an error message) back to the requesting client. The client interprets and presents the document. The client is responsible for document presentation. The language that Web clients and servers use to communicate with each other is called the Hypertext Transfer Protocol (HTTP). All Web clients and servers must be able to speak HTTP in order to send and receive hypermedia documents. For this reason, Web servers are often called HTTP servers, or HTTP Deaemons (HTTPD).

Web documents are written in a text-formatting language called Hypertext Mark-up Language (HTML). The HTML is used to create hypertext documents for use on the Web. Basically it is a set of “mark-up” symbols or codes inserted in a Web file that tells the Web browser how to display a Web page for the user. HTML is a language, neither an application nor a software package. It is simply a data-set of text and instructions that requires a web browser to be used.

5. Web Browsers

Web browsers are the applications that allow a user to view HTML documents from a computer connected to the Internet. Software such as Netscape, Microsoft Explorer, etc., read files created with HTML (Hyper Text Mark-up Language) and displays interactive Web pages to the user. The first browser, called NCSA Mosaic, was developed at the National Center for Supercomputing. The easy-to-use point-and-click interface helped popularize the Web. Availability of ready-to-use, publicly available, user-friendly graphical Web browser for all prevalent platforms paved the way for unprecedented growth of Internet applications and services. Standard WWW clients such as Netscape Navigator and Internet Explorer are being upgraded regularly for added functionality such as e-mail client, support for JAVA and Active X and the ability to view important document formats without having to install plug-ins for them. These browsers solved the maintenance problem allowing developers to concentrate fully on the server side and not to bother with the client side. These browsers are available freely and are easy to use eliminating the need of extensive support and user’s training. The two important graphical browsers are Microsoft Internet Explorer and Netscape Navigator. Both are fast and both have integrated audio and video. Most of the browser can be downloaded at no charge.

There is no standard way of viewing or navigating the Web. A variety of Web browsers exist. Most browsers have most of the functionality, although there are some differences in levels of support and overall performance. Most browsers are still being updated and improved, with new releases every two or three months. A number of web browsers are available for each computing platform including Lynx for terminal-based users (without the graphics support). The basic capabilities of a browser are to retrieve documents from the Web, jump to links specified in the retrieved document, and save and print the retrieved documents.

A Web browser is a client program that uses the Hypertext Transfer Protocol (HTTP) to make requests to the Web servers on behalf of the user. A commercial version of the original browser, Mosaic was launched as Netscape Navigator. Many of the user interface features in Mosaic, however, went into the first widely-used browser, Netscape Navigator. Microsoft followed with its Internet Explorer. Today, these two browsers are highly competitive and most of the Internet users are aware of these two browsers only. Although the online services, such as America Online, Compuserve and Prodigy, originally had their own browsers, virtually all now offer the Netscape or Microsoft browser. Lynx is a text-only browser for UNIX shell and VMS users. Another recently offered browser is Opera.

A Web browser contains the basic software a user need in order to find, retrieve, view, and send information over the Internet. Some of the important functions of a browser are:

 Send and receive electronic mail;

 Read messages from newsgroups, forums about thousands of topics in which users share information and opinions;

 Browse the World Wide Web (or Web) where a user can find and view a rich variety of text, graphics and interactive information.
 Browsers such as Microsoft Internet Explorer version 5.0 + include additional Internet-related software. For example, with Internet Explorer 5.0, also incorporates:

o Windows Media server
o NetMeeting, conferencing software
o ActiveX.controls
o Chat
o ActiveMovie application programming interface
o Active Channel webcast
o Subscriptions
o Dynamic HTML
o Windows Media

The features mentioned above allows a user to see and hear live and recorded broadcasts such as concerts or breaking news with synchronized audio, graphics, video, URLs, and script commands. Streaming technology allows a user to see or hear the information as it arrives instead of having to wait for the entire file to download.

The browser performs two tasks: first it identifies HTML elements, then it translates the identified elements into actions. For example, it may identify the HTML bold element and then display a block of text in bold format. Other actions might be to display an image, add a blank line between text, or link to another document. Many of these actions can be handled by the browser itself; for example, within its viewing area it can display text and some types of images (if it uses a graphical user interface such as Windows).

In addition to web servers, Web browsers can also access Gopher, FTP, and WAIS servers. Essentially, besides, HTTP browsers understand protocols associated with Gopher, FTP and WAIS. Thus, browsers provide a common navigational interface between all these systems, seamlessly executing the appropriate protocol behind the scenes.

There are several Web browsers or Web clients available to surf the Internet. Some of the important ones are:

 Mosaic Version 2.1.1 (http://archive.ncsa.uiuc.edu/SDG/Software/WinMosaic/HomePage.html)

At one time first and foremost among Web clients in the Mosaic graphical interface. It was developed in 1993 by the National Center for Supercomputing Applications at the University of Illinois. Before Mosaic, all interfaces to the Web were simple text-based, line-by-line interfaces. They were hypertext, but not graphical or multimedia. When a Windows version of Mosaic became available to Internet users for free, suddenly the Web became the hottest information system on the Network because it was so much more powerful.

 Netscape Navigator 8.1 (http://www.netscape.com/)

The Netscape Navigator was developed by the same people who created Mosaic at NCSA. The Netscape 7 browser has a tabbed user interface that allows easy switching from one open Web page to another. A user can create bookmarks that open a specific set of tabs. Another convenient feature is one-click search. Highlight a word (but not a link) in the browser window and right-click on it, and start a search for it from any search engine. The new version of Netscape has also implemented the "Sidebar" pane that runs down the left edge of the screen. Internet Explorer does the same thing (click on Favourites or History and a thin left-hand window opens). Netscape's Sidebar provides tabbed access to addresses, bookmarks, news, history, and a variety of other useful things. The Netscape has a mail client and address book that are perfectly adequate. The biggest difference between Netscape 7 and Internet Explorer is integration. Because Microsoft presumes ownership of your desktop, it has less need to pack as many applications into its Web browser. Netscape, on the other hand, ties together e-mail, browsing, and instant messaging in one application. For example, both AOL's instant messaging client and ICQ are integrated into the browser. Microsoft's Messenger client is not fully integrated. In Netscape 7, the integration feels smooth and natural.

 Internet Explorer 7.0 (http://www.microsoft.com)

The Internet Explorer is Microsoft’s Internet browser that comes packaged with Windows Operating System. It can also be downloaded from their Web site free of cost. 75-80% of Internet users use the Internet Explorer. The new version of Internet Explorer integrates Outlook Express. It provides a private, reliable, and flexible browsing experience and the freedom to experience the best of the Internet for users of Windows XP, Windows Millennium Edition (Windows Me), Windows 2000 Professional, Windows 98 and Windows 98 Second Edition, and Windows NT® 4.0 Workstation. It also includes a free copy of the Advanced Searchbar which is an Internet Explorer toolbar that allows users to quickly access and search over 60 search engines and is jam packed with features including blocking of pop-up windows.

 Avant Browser v9.02 (http://www.avantbrowser.com/)

This browser add-on, runs on top of Internet Explorer. An integrated pop-up stopper and Flash animation filter protects users from unwanted distractions. Avant browser supports tabbed-multi-window browsing, i.e. the tabbed interface let a user open several sites inside one browser and makes navigation easier. The built-in Google search engine lets a user search right from the browser's taskbar. It has several built-in features like records eraser to keep privacy by deleting typed addresses, auto-complete passwords, cookies, history of visited web sites, temporary Internet files and search keywords. It has built-in Flash Animation filter and options to block downloads of pictures, videos, sounds and ActiveX components. With these options users can control their bandwidth and speed up page loading. Additional mouse functions such as if link is clicked with the middle mouse button, the link will be opened in a new window in the background. It supports Real Full Screen Mode and Alternative Full Desktop Mode. It is fully Internet Explorer compatible and supports all Internet Explorer functions, including Cookies, ActiveX Controls, Java Script, Real player and Macromedia Flash. Internet Explorers favourites are automatically imported to Avant. Avant Browser supports many different languages.

 Enigma Browser (http://www.suttondesigns.com/)

Enigma Browser incorporates a large collection of powerful features like built-in pop-up stopper, skinned window frame, form filler, site group, quick-search, auto login, hidden sites, built-in commands and scripting, online translation, script error suppression, blacklist / white list filtering, URL Alias. It brings convenient and comfortable browsing. It has ability to turn on / off Flash Animation. Enigma provides convenient access to major search engines by Quick-Search Bar. It has built-in VBScript / Jscript / HTML / Text editor. It has features to hide sites and show a site at users' request. It provides for auto login, i.e. it automatically connect and log into specified website with just one click. Enigma seamless integrate with online translation engine and dictionaries. It has the ability to suppress script error message dialog.

 Crazy Browser v1.05 (http://www.crazybrowser.com)

Crazy Browser facilitates browsing multiple web sites at once. It blocks advertisements. Users have the option to turn off multimedia and browse the web in text mode. Users can search on a number of search engines that comes with the program. It incorporates Smart Popup Filter. It supports tabbed-multi-window browsing.

 Automatic Search Browser (http://www.4comtech.com/)

AutomaticSearch is a search-themed web browser that automatically finds related links, subjects, and topics associated to the current website being viewed. It features an integrated search engine utilizing the popular engines (Dmoz, Google, Yahoo, All the Web, MSN, Lycos, Hotbot, etc) and allows users to quickly switch among these search results using tabs. Users can also save and access your favourite websites easily by using toolbar buttons.

 Mozilla v1.7.2 (http://www.mozilla.org)

Mozilla, developed by the Mozilla.org open-source community, is a cross-platform product with support for Windows, Linux, and Macintosh 8/9/X. It incorporates filter to stop popup advertisements. It supports tabbed-multi-window browsing that let a user open new pages easily instead of forcing a user to open a new window and then click back to see the previous screen while the new window loads, Mozilla offers a tab to the new window and loads that page in the background, letting you stay focused on the work at hand. Another welcome advance is the ability to turn off pop-up ads and animated GIFs. Besides the browser, Mozilla includes an instant-messaging application, an e-mail client, Web-composer software, and some nifty cookie-management and anti-spam features.

 Opera (http://www.opera.com/)

Opera browser is designed for Windows with built-in pop-up stopper to stop unwanted pop-up pages. It supports tabbed-multi-window browsing. It has integrated Google search engine that let a user search right from the browser's taskbar. It has a built-in mail client. Opera has extremely well designed with minimum resource use in terms of hard disc and memory requirement. Opera can be fully customizable, it has page magnification capability, and graphics handling make Opera an alternative to Internet Explorer or Netscape.

 EasyBrowse v2.0 (http://www.vrameen.com/)

EasyBrowse is a free Internet browser that works on all Windows platforms (95 and above). The new version includes adds, the Pop-up disabler option to a standard list of features including plug-ins, direct linking to 11 search engines, plug-in checker, large browsing window, favorites and a history list.

 NeoPlanet Browser (http://www.neoplanet.com/)

NeoPlanet offers a great browsing experience while allowing users to customize the interface. The NeoPlanet browser has an embedded email client that has most of what a user needs. Neoplanet facilitate download management and offers QuickSearch features. A key difference between NeoPlanet and other alternative browser options like Opera is that NeoPlanet is not actually a full-fledged Web browser. NeoPlanet is actually a front-end for the Internet Explorer. It means that every technology supported by Internet Explorer is supported by NeoPlanet as well, i.e. from Java applets to dynamic HTML to ActiveX Controls. NeoPlanet also offers a powerful search function and a customizable channel bar to deliver "the best of the Web in 3 clicks or less".

 Lynx (http://www.browser.lynx.org/)

Lynx is a text-based, full screen interface to the Web. Arrow keys, tabs, and the cursor are used to move around and select items instead of a mouse. Lynx interface is not multimedia, so pictures, icons, maps and other graphical elements cannot be viewed.

5.1. Plug-ins or Helper Programs

The Web is practically an ultimate tool of integration. It facilitates incorporation of all kinds of media files on to a Web page. A web author can incorporate text files (formatted or unformatted), images, video clippings, audio files, graphics, animations, and other types of actions. The browser cannot handle all these files and formats and thus requires additional software programs to execute them. This may be to play a sound file or a full-motion video, or to display an obscure type of image format.

Initially, the Netscape browser allowed users to download, install, and define supplementary programs that played sound or motion video or performed other functions. These were called helper applications. However, these applications run as a separate application and require that a second window be opened. A plug-in application is recognized automatically by the browser and its function is integrated into the main HTML file that is being presented.

Plug-ins or helper applications are external software programs that allow Web users to view or hear multimedia presentations, regardless of platform. Plug-ins can easily be installed and used as part of Web browser. Plug-ins or helper applications extend and enhance the capabilities of Web browsers such as Netscape and Internet Explorer, and are needed to handle many of the newer hypermedia such as streaming audio, vector graphics, three-dimensional multimedia and virtual worlds. Browsers hand over data to appropriate helper applications such as RealAudio, Adobe Acrobat, QuickTime, Shockwave and others.

5.11. Native Helper Programs: Native helper programs are integrated into the browser itself, just like some word processors include internal spell check programs. In practice, the browser identifies an element, then calls up a native helper program to execute an action. For example, when a browser identifies a file stored in the JPEG format (a type of compressed image), it calls up an internal program that can translate JPEG files. The internal program then processes and displays the image inside the browser's viewing area. Netscape has a native helper program that can handle JPEG files this way. Browsers that do not have the same capacity must use external helper programs to read images.

5.12. External Helper Programs: External helper programs address the fact that there are too many file formats for one browser to handle alone. Instead of one massive omni-lingual program that can read every type of file the WWW carries, browsers incorporate smaller external helper programs to accomplish the same end. These specialized programs are separate from the browser and perform functions identical to native helper programs, except that the actions are executed outside of the browser. For example, when a browser identifies a sound file stored in .WAV format, it calls up an external program that can translate .WAV files. The browser passes the .WAV file to the external helper program, which then processes and plays it.

There are two main differences between native and external helper programs. The first difference is that external helper programs run independently of the browser. This means that once files are passed from the browser to the external helper program, the browser is free to resume navigating the WWW. In contrast, native helper programs tie up the browser until the native program completes its action and is closed. The second difference concerns how the two types of helper programs are acquired. Native helper programs are included within the browser itself. However, external helper programs must be acquired independently by the end-user. The end-user then has to configure the browser to point to the external program, telling it when to use it (e.g., to view a particular file format) and where it is located within the computer's storage. Most often this is done through setting the Preferences area of the browser's Options.


6. Mark-up Languages

The mark up is used in printing industry as instructions for printing in a particular style. Mark up is also used while proof reading; editors mark the text to write the text in a particular font type, size and in bold or italics while printing. Similarly to display the electronic text in a web page instructions are given as mark-up within the text to make parser (a computer program) understand how the text should appear on display. Mark up is also used for data retrieval, particularly in the library and information field. Once the structure of document is fixed, one can easily find which part of the document contains which kind of data.

Three basic concepts are fundamental to understanding of all mark up languages, when described in SGML terms. These are i) a mark up entity, ii) a markup element and its associated attributes; and iii) a document type.

Entity in SGML is texts that are composed of streams of symbols (characters or bytes of data, marks on a page, graphics, etc.). At a higher level of abstraction, a text is composed of representations of objects of various kinds, linguistically or functionally defined. Such objects do not appear randomly within a text, but various types of objects appear in specifiable relationship to other objects, i.e. they may be included within each other, linked to each other by reference or simply presented sequentially. This level of description sees text as composed of structurally defined objects, known as elements in SGML. The grammar defining how elements may legally be combined in a particular class of texts is known as a document type. These three fundamental concepts are adequate to describe all the complexities of marked-up texts, of whatever kind and for whatever purposes.

6.1 Standard Generalized Mark-up Language (SGML)
SGML is application independent, non-proprietary and extremely flexible mark-up language. It was first developed in 1970 as GML (Generalized Markup Language) and evolved into an International Standard (ISO, 1986). SGML is frequently referred to as a meta-language, which means that SGML is not a single language but a language that describes a family of markup languages. In other words, SGML is the framework for defining particular mark-up languages. SGML is effective solution for handling complexity of electronic publishing because of its powerful and flexible structuring capabilities, as well as for its capacity to capture and organize information about the publications (“metadata”). It provides for descriptive, as opposed to procedural markup. That is, it simply, states names to categorize parts of a document instead of specifying process to be carried out.

SGML uses text characters both for the text as well as for mark up that describes that text. It has no proprietary codes; instead each user (or group of users) may create whatever codes are necessary and meaningful for what is being published. A publisher can define his own set of codes for books and journal publishing. The key to self-defined codes in a SGML documents is called DTD (Document type Definition). Codes sets or DTDs can be specific to a single book or journals or can span to a group of related books or journals. An SGML document consists of three distinct parts namely:

Declaration: It gives fundamental information like language of document and code set being used (i.e. English/ASCII)

DTD: Details of codes and rules restricting their use.

Instance: The text being published, marked up with the codes described in the DTD.

SGML concerns itself with the structural features of a document while the appearance and display features are left to the ultimate presentation system to determine how those features appear on display or print. Resultantly, when documents move from system to system, or portions of one document are used in another, they do not need to be recoded. Because of its powerful and flexible structuring capabilities, as well as its capacity to capture and organize information about the publications, SGML-coded documents can be used effectively to search information contents of documents based on the structure and content of the information. Many SGML depositories are considered as “text databases”. Since they enable a publisher to organize the published information in different ways for different contexts.

Contents of an SGML documents are stored separately from its format, resultantly contents on parts of contents can be rendered in different ways for different needs, platforms and display methods. SGML is often used as an archival format and for document reuse and repurposing. Richly-coded SGML documents also facilitate more complex searching than unstructured, word-processed text. For fully marked-up documents, searches can be made on bibliographic citation marked or such citations can be extracted from each document to create a citation database as a secondary product.

SGML liberates documents from the cumbersome and costly process of conversion from system to system. It does not require any special hardware or software. It is possible to create a valid SGML file in any word processor or text editor although there are number of SGML-based systems available in the market. SGML preserves the document and its coding from obsolescence as well. Owing to the fact that an SGML document incorporates the key to its own codes (Declaration, DTD), it is possible to validate SGML codes by parsing SGML file. Parsing is a process by which the document instance is checked against the declaration and the DTD to make sure all the codes in a file are legal and used properly.

6.2. Extensible Markup Language (XML)

XML is subset of the Standard Generalized Markup Language (SGML). It is designed to make it easy to interchange structured documents on the web. Like SGML, XML also deals with the structure of document and not its formatting. The Cascading Style Sheet (CSS) developed for HTML can also function for XML to take care of formatting and appearance. Unlike HTML, XML allows for the invention of new codes. XML files are not only consistent and compatible with SGML, it also simplifies SGML in many ways. For example, while SGML allows "tag minimization", enabling the omission of end tags, XML always requires explicit end tags that makes it a lot easier to write tools and browsers. XML introduces the concept of a "well-formed" document, one in which the tags used are nested correctly and proper XML syntax is followed. In addition, like SGML, XML allows for "valid" documents too, which go a step beyond "well formed" status by using an explicit structure defined in a DTD. "Well-formedness" is a very appealing feature of XML, because it allows publishers to tag what they are publishing in whatever way is meaningful, without being confined to a specific set of tags (as with HTML) or needing to write a DTD.

XML document may require companion XSL (Extensible Style Language) to reformat into RTF, LaTeX or any other format. XSL also makes it possible to offer database functionality from XML documents with no actual database needed. XML also defines how Internet Uniform Resource Locators can be used to identify component parts of XML data streams. Akin to an SGML document, XML documents can also be verified to ensure that each component of document occurs in a valid place within the interchanged data stream by defining the role of each element of text in a formal model, known as a Document Type Definition (DTD). An XML DTD allows computers to check, for example, that users do not accidentally enter a third-level heading without first having entered a second-level heading, something that cannot be checked using the Hypertext Markup Language (HTML). However, unlike in SGML, DTD is not a necessity in XML. If no DTD is available, either because all or part of it is not accessible over the Internet or because the user failed to create it, an XML system can assign a default definition for undeclared components of the markup.

XML allows users to:

 bring multiple files together to form compound documents;

 identify where illustrations are to be incorporated into text files, and the format used to encode each illustration;

 provide processing control information to support programs, such as document validators and browsers;

 add editorial comments to a file.

Like SGML, XML does not have a predefined set of tags, of the type defined for HTML, that can be used to markup documents in a standardized template for producing particular types of documents. XML is a formal language that can be used to pass information about the component parts of a document to another computer system. XML is flexible enough to be able to describe any logical text structure, whether it be a form, memo, letter, report, book, encyclopaedia, dictionary or database.

XML is based on the concept of documents composed of a series of entities or object. Each entity or object can contain one or more logical elements. Each of these elements can have certain attributes (properties) that describe the way in which it is to be processed. XML provides a formal syntax for describing the relationships between the entities, elements and attributes that make up an XML document, which can be used to tell the computer how it can recognize the component parts of each document. XML differs from other markup languages in that it does not simply indicate where a change of appearance occurs, or where a new element starts. XML sets out to clearly identify the boundaries of every part of a document, whether it be a new chapter, or a reference to another publication. The structure of a document can be checked if the user provides a document type definition that declares each of the permitted entities, elements and attributes, and the relationships between them.

6.3. Hypertext Markup Language (HTML)

Hypertext Markup Language (HTML) is an SGML application complete with DTD. It is designed to tell a browser how to display documents on the web. HTML is the defacto language of the web and is largely responsible for resurgence of interest in SGML in the past few years. Unlike SGML, HTML has a pre-defined set of codes, that are easy to learn and use and build tools for writing HTML pages. HTML codes are imbedded into the text that communicate to a web browser such as Netscape Navigator or Micorsoft Internet Explorer. Like SGML, it also uses simple text or ASCII for text as well as for the HTML codes. An HTML page can thus be built using a word processing package or a text editor. There are several HTML editors and conversion programs that act similar to a word processing package. These editors typically show the codes as they are inserted. In a What You See Is What You Get (WYSIWYG) environment, such as MS Word or other MS Windows packages, the user never see these codes. Web browsers are similar to WYSIWYG word processors because it reads the imbedded codes and then applies them to the specified text.

HTML is competent at presenting text, graphics, images in a reasonably decent layout. Web browser readily accommodate a multitude of plug-ins that allow inclusion of audio, video, 3-D and other specialized files. Any of these can also be included as a link in a standard HTML page. Clicking the link loads the plug-in to view or play the file. HTML files are tiny since they are simple text files. Further, the static HTML web pages can be transformed into vibrant, dynamic and interactive web creations using ever evolving web technologies like CGI Script, Perl, Java, Javascript, ASP, DHTML, XML and open database connectivity (ODBC) for incorporating interactivity on a web page.

Simplicity of HTML is also its serious limitation for books and journals. HTML does not provide enough codes to present complexities of a scientific text. HTML does not provide for Greek and maths characters that are important to scientific text. Moreover, HTML is all about presentation and not for structure or contents. The only contents that successfully describes is in Metadata codes or in its title. Furthermore, an HTML file can be derived from an SGML file any time but the reverse is not possible. The competency of HTML at presenting text has further been enhanced with use of Cascaded Style Sheet (CSS).


6.4. Dynamic HTML

The Dynamic HTML is a collective term for a combination of new Hypertext Markup Language (HTML) tags and options, that will let you create Web pages more animated and more responsive to user interaction than previous versions of HTML. Much of Dynamic HTML is specified in HTML 4.0. Simple examples of dynamic HTML pages would include (1) having the color of a text heading change when a user passes a mouse over it or (2) allowing a user to "drag and drop" an image to another place on a Web page. Dynamic HTML can allow Web documents to look and act like desktop applications or multimedia productions.

The features that constitute dynamic HTML are included in Netscape Communications' Web browser, Navigator 4.0 (and upwards) (part of Netscape's Communicator suite), and by Microsoft's browser, Internet Explorer 4.0 (and upwards). While HTML 4.0 is supported by both Netscape and Microsoft browsers, some additional capabilities are supported by only one of the browsers. The biggest obstacle to the use of dynamic HTML is that, since many users are still using older browsers, a Web site must create two versions of each site and serve the pages appropriate to each user's browser version.

Both Netscape and Microsoft support:

 An object-oriented view of a Web page and its elements
 Cascading style sheets and the layering of content
 Programming that can address all or most page elements
 Dynamic fonts

6.5. Virtual Reality Modeling Language (VRML)

VRML, often pronounced “ver-mull,” is Virtual Reality Modelling Language, the open standard for virtual reality on the Internet. You can use VRML to create three dimensional worlds, representations of information, and games. As an open standard, no one particular company controls the VRML, specification, that is, the language definition. Theoretically, anybody can use VRML to write software or worlds without having to license technology from others. Using VRML, you can build a sequence of visual images into Web settings with which a user can interact by viewing, moving, rotating, and otherwise interacting with an apparently 3-D scene. For example, you can view a room and use controls to move the room as you would experience it if you were walking through it in real space.

To view a VRML file, you need a VRML viewer or browser, which can be a plug-in for a Web browser you already have. Among viewers you can download for the Windows platforms are blaxxun's CC Pro, Platinum's Cosmo Player, WebFX, WorldView, and Fountain. Whurlwind and Voyager are two viewers for the Mac. Virtual reality refers to an immersive environment, an environment that you feel you are inside of. You can attain this immersive feeling with computers using 3D graphics and audio. Sounds in a virtual world can be specialised so that they sound louder when you are closer to them.

When virtual reality happens on the Internet, new possibilities arise for distributed, networked virtual environments. In HTML, in-line images let you include graphics from anywhere on the Web on to your Web page. In VRML, you can have in-line parts of a virtual world, so that a chair in a VRML world can come from a URL on a server in France, while the garden comes from a server in Japan, and the soundtrack is from a URL on a server in England. In addition, hyperlinks from an object in a VRML, world can lead to another URL on the Web, which could be another VRML world, an HTML page, or any other URL!

6.51 Uses of VRML

There are many applications for VRML, varying in focus from VRML’s open 3D file format, to its networking capabilities, to its multimedia nature. Here are some applications for which people are currently using VRML:

 Computer-aided design (CAD),
 Scientific simulations
 Games
 Data visualization
 Distributed, multi-user environments
 Social computing
 User interfaces to information
 Financial applications
 Product marketing and advertising
 Education
 Entertainment.


7.0 Search Engines

The growth of the Internet has led to a paradoxical situation. While on the one hand there is a huge amount of information available on the Internet, on the other hand sheer volume of unorganized information makes it difficult for the users to find relevant and accurate information in a speedy and efficient manner. Internet can be said to be the most exhaustive, important and useful source of information on almost all aspects of knowledge hosted on millions of servers connected to Internet around the world. It is a known fact that there is neither defined policies for hosting information nor there is a centralized database for organizing and searching the information available on the Internet. This makes the Internet as the most diverse and unorganized source of information. Searching for specific information is the main purpose of using Internet for several users. However, with availability of excessive information, it has become very difficult for a common user to search for precise and relevant information on the Internet. To tackle this situation, computer scientists came up with search tools that search through the information on the Internet to churn out required information by a user. There are varieties of search, resource discovery and browsing tools that has been developed to support more efficient information retrieval. Search engines are one of such discovery tools.

Search engines use automated programs, variably called bots, robots, spiders, crawlers, wanderers and worms developed to search the web. The robots traverse the web in order to index web sites. Some of them index web sites by title, some by uniform resource locators (URLs), some by words in each document in a web site, and some by combinations of these. These search engines function in different ways and search different parts of the Internet.

7.1 Search Engines: Definition

Search engine is a generic term used for the software that “searches” the web for pages relating to a specific query. Google and Excite are two examples of common search engines that index and search a significant part of the web. Several web sites have their own search engines to index their own websites. The World Wide Web has several sites dedicated to indexing of information on all other sites. These sites allow a user to search the web for any word or combination of words for information resources on the web.

A search engine is a computer program that searches documents on the Internet containing terms being searched by a user. A search engine can be defined as a tool for finding, classifying and storing information on various websites on the Internet. It can help in locating information of relevance on a particular subject by using various search methods. It is a service that indexes, organizes, and often rates and reviews Web sites. It helps users to find the proverbial needle in the Internet haystack. Different search engines work in different ways. Some rely on people to maintain a catalogue of Web sites or web pages, others use software to identify key information on sites across the Internet. Some combine both types of service. Searching Internet with different search engines for the same topic, therefore, provide different results. Fig.1 shows number of hits for 25 single word queries conducted on nine search engines. Google found more number of total hits than any other search engine.










Fig.1: Number of hits for 25 single word queries conducted on nine search engines. Google found more number of total hits than any other search engine
(Source: http://searchengineshowdown.com/stats/size.shtml)

Search engines are also defined as online utilities that quickly search thousands of Web documents for a word or phrase being searched. Although there are some subscription-based search engines, most of them operate on profits from advertisements. It should be noted that no single search engine has the contents of every Web page on the Internet. Instead, each search engine defines its scope in terms of contents for Web pages that it would host. Moreover, some search engines index every word on every page. Others index only part of the document. Full-text search engines generally pick up every word in the text except commonly occurring stop words such as “a”, “an”, “the”, “is”, “and”, “or” and “www”. Some of the search engines discriminate upper case from lower case, others store all words without reference to capitalization. A user, therefore, gets different results from different search engines because of reasons mentioned above.

Search engines are usually accessed using Web clients called Web browsers. Each search engine provides different search options and has its own peculiarities. Search engines also differ greatly in the types of resources they allow a user to search. Many search engines offer both search and browse interfaces.

No comments: