In today's computer world, we frequently observe data as a string of bits, numbers, and symbols, or objects, which are meaningful when sent to a program in a given format. Data can be defined as a collection of mere symbols. When data is processed with semantic considerations, we get information. Knowledge can be further defined as organized information. In other words, knowledge can be considered as data at a higher level of abstraction and generalization. Knowledge discovery is nothing but extracting valuable knowledge from a huge pool of data; that is, the process of detecting valid, innovative, useful, and understandable patterns in data. The Semantic Web is the representation of data on the World Wide Web. It
is a collaborative eﬀort led by W3C with participation from a large number of
researchers and industrial partners. It is based on the Resource Description
Framework (RDF), which integrates a variety of applications using XML for
syntax and URIs (Uniform Resource Identiﬁers) for naming.
Data mining is a component of knowledge discovery. Under certain satisfactory computational efficiency limitations, data mining finds patterns or models in data and list typically used primarily by trade, financial, communication, and marketing organizations with a sound consumer focus. It enables these companies to determine relationships among "internal" factors, such as price, product positioning, or staff skills, and "external" factors, such as economic indicators, competition, and customer demographics. It enables them to determine the impact on sales, customer satisfaction, and corporate profits.
Data needs to be cleaned prior to mining a data set in order to eliminate errors and guarantee consistency. Data cleaning usually involves the use of straightforward statistical techniques, but sometimes may need highly sophisticated data analysis.
The World Wide Web has become an incredibly common medium for publishing. Clearly the Web provides a rich repository of data for mining. However, searching, comprehending, and using the semistructured information stored on the Web poses a significant challenge, because this data is more sophisticated and dynamic than the information contained in structured commercial databases. Chapters4 and 5 describe the features and particulars of the data-mining techniques necessary for building an intelligent Web.
To enhance keyword-based indexing—the basis for web search engines—researchers have applied data mining to web-page ranking. In this context, data mining helps search engines find high quality web pages. Web services and their usability must be improved and made more comprehensible in order to reach their full potential. As researchers continue to develop data-mining techniques, this technology will play a significant role in meeting the challenges of developing an intelligent Web. The Web is an immense and dynamic collection of data that includes countless hyperlinks and huge volumes of access and usage information. It provides a rich and exceptional data-mining source.
Wide Web. In web mining, data can be collected at the server-side, client-side, or proxy servers or acquired from an organization's database. Each type of data gathering varies not only in the location of the data source, but also in the characteristics of the data, the segment of the population from which the data is collected, and its method of implementation. There are many kinds of data that can be used in web mining. A survey paper on web-usage mining (Srivastava et al. 2000) classified collected data into the following types:
• Content: The real data in web pages, such as textual, image, audio, video, hyperlinks, and metadata.
• Structure: Data that illustrates the organization of content. This includes the arrangement of various HTML or XML tags within a given page. The primary kind of inter-page structure information is hyperlinks connecting one page to another.
• Usage: Secondary data, which includes data from web-server access logs, proxy server logs, user profiles, cookies, and so on, derived from users' interactions on the Web.
• User Profile: Data that imparts demographic information, such as registration data and customer profile details, regarding users of a website.