WWW.BOOK.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Books, abstracts, thesis
 
<< HOME
CONTACTS

Pages:   || 2 |

«Algorithms for Mining Sequential Patterns K.M.V. Madan Kumar and 2Dr. P.V.S. Srinivas Research Scholar, CMJ University, Meghalaya, India Professor & ...»

-- [ Page 1 ] --

International Journal of Information Sciences and Application.

ISSN 0974-2255 Volume 3, Number 1 (2011), pp. 59-69

© International Research Publication House

http://www.irphouse.com

Algorithms for Mining Sequential Patterns

K.M.V. Madan Kumar and 2Dr. P.V.S. Srinivas

Research Scholar, CMJ University, Meghalaya, India

Professor & Head, Dept. of CSE,

Geethangali College of Engg. & Tech, Hyderabad, India

E-mail: 1madankukunuri@gmail.com

Abstract

We are given a database of sequences, where each sequence is a list of transactions ordered by transaction time, and each transaction is a set of items.

The problem is to discover all sequential patterns with a user specified minimum support, where the support of a pattern is the number of datasequences that contain the pattern. An example of a sequential pattern is 5% of customers bought `Foundation' and `Ring world' in one transaction, followed by `Second Foundation' in a later transaction". Here in this paper we have reviewed various algorithms to mine the sequential patterns from sequence data bases and published those algorithms.

Over View of Datamining Data mining [Chen et al. 1996] is the process of extracting interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from large information repositories such as: relational database, data warehouses, XML repository, etc. Also data mining is known as one of the core processes of Knowledge Discovery in Database (KDD).Usually there are three processes in KDD. One is called preprocessing, which includes data cleaning, integration, selection and transformation. The main process of KDD is the data mining process, in this process different algorithms are applied to produce hidden knowledge. After that comes another process called post processing, which evaluates the mining result according to users’ requirements and domain knowledge. Regarding the evaluation results, the knowledge can be presented if the result is satisfactory, otherwise we have to run some or all of those processes again until we get the satisfactory result. As not all the data in the database are related to our mining task, the process is to select task related data from the integrated resources and transform them into a format that is ready to be mined. Various data mining techniques are applied to the data source, different 60 K.M.V. Madan Kumar and Dr. P.V.S. Srinivas knowledge comes out as the mining result. Those knowledge are evaluated by certain rules, such as the domain knowledge or concepts. After we get the knowledge, the final step is to visualize the results. They can be displayed as raw data, tables, decision trees, rules, charts, data cubs or 3D graphics. This process is try to make the data mining results easier to be used and more understandable.

Types of Mining Generally speaking, there are two classes of data mining descriptive and prescriptive.

Descriptive mining is to summarize or characterize general properties of data in data repository, while prescriptive mining is to perform inference on current data, to make predictions based on the historical data. There are various types of datamining techniques such as association rules, classifications and clustering. Based on those techniques web mining and sequential pattern mining are also well researched. We will review those different types of mining techniques with examples. Association Rule Association rule mining, one of the most important and well researched techniques of data mining, was first introduced in [Agrawal et al. 1993]. It aims to extract interesting correlations, frequent patterns, associations or casual structures among sets of items in the transaction databases or other data repositories.

Example 1.1.

In an online book store there are always some tips after you pur- chase some books, for instance, once you bought the book Data Mining Concepts and Techniques, a list of related books such as: Database System 40%, Data Warehouse25%, will be presented to you as recommendation for further purchasing.

In the above example, the association rules are: when the book Data Mining Concepts and Techniques is brought, 40% of the time the book Database System is brought together, and 25% of the time the book Data Warehouse is brought together.

Those rules discovered from the transaction database of the book store can be used to rearrange the way of how to place those related books, which can further make those rules more strong. Those rules can also be used to help the store to make his market strategies such as: by promotion of the book Data Mining Concepts and Techniques, it can blows up the sales of the other two books mentioned in the example.

Association rules are widely used in various areas such as telecommunication networks, market and risk management, inventory control etc. Various association mining techniques and algorithms will be briefly introduced and compared later in this Chapter.

Sequential Pattern Mining As we know, data are changing all the time, especially data on the web are highly dynamic. As time passes by, new datasets are inserted, old datasets are deleted while some other datasets are updated. It is obvious that time stamp is an important attribute of each dataset, also it is important in the process of data mining and it can give us more accurate and useful information. For example, association rule mining does not take the time stamp into account, the rule can be Buy A= Buy B. If we take time stamp into account then we can get more accurate and useful rules such as: Buy A Algorithms for Mining Sequential Patterns 61 implies Buy B within a week, or usually people Buy A every week. As we can see with the second kind of rules, business organizations can make more accurate and useful prediction and consequently make more sound decisions. A database consists of sequences of values or events that change with time is called a time-series database [Han and Kamber 2000], a time-series database records the valid time of each dataset.





For example, in a time-series database that records the sales transaction of a supermarket, each transaction includes an extra attribute indicate when the transaction happened. Time-series database is widely used to store historical data in a diversity of areas such as, financial data, medical data, scientifical data and so on. Different mining techniques have been designed for mining time-series data [Han and Kamber 2000], basically there are four kinds of patterns we can get from various types of

time-series data:

Trend analysis: trend analysis is to find the evolution patterns of attributes over time, they can be long-term trend movements, cyclic movements or variations, seasonal movements and irregular/random movements. For example, with the history stock prices of Microsoft, we can model the price changes into a function Y=F(t), which can be illustrated as a time series graph.

Similarity search: similarity search tries to find sequences that differ only slightly.

Similarity searching is a blurry matching process that can tolerate some differences within a certain threshold. Based on the length of sequences we are trying to match, sequence matching can be classified as: subsequence matching and whole sequence matching. Suppose we transform the data of stock prices into curves, those curves include many different shapes such as up, sharp up, down, big down.

Sequential patterns: sequential pattern mining is trying to find the relationships between occurrences of sequential events, to find if there exist any specific order of the occurrences. We can find the sequential patterns of specific individual items, also we can find the sequential patterns cross different items. Sequential pattern mining is widely used in analyzing of DNA sequence.

Periodical patterns: periodical patterns are those recurring patterns in the timeseries database, periodicity can be daily, weekly, monthly, seasonal or yearly.

Obviously, periodical patterns mining can be viewed as sequential pattern mining by taking the periodical sequences as a set of sequences.

For periodical pattern mining, we have a sequence of data for a very long time, but we can partition them based on periods and get a set of sequences.

Sequential database consists of sequences of ordered events with or without concrete notion of time. Sequential database is a special case of time series database, consequently most researches in sequential pattern mining focus on two main issues.

The first issue is sequential pattern mining, aiming at finding the frequently occurred sequences to describe the data or predict future data. The second issue is periodical pattern mining, which can be viewed as sequential pattern mining.

62 K.M.V. Madan Kumar and Dr. P.V.S. Srinivas Sequential Patten Mining Problem Example 2.1. From a book store’s transaction database history, we can find the frequent sequential purchasing patterns, for example 80% customers who brought the book Database Management typically bought the book Data Warehouse and then brought the book Web Information System with certain time gap.

Sequential pattern mining was first introduced in [Agrawal and Srikant 1995]. In the above example, all those books need not to be brought at the same time or consecutively, the most important thing is the order in which those books are brought and they are bought by the same customer. 80% here represents the percentage of customers who comply this purchasing habit.

Let D be a database of customer transactions, I=I1, I2, · · ·, Im be a set of m distinct attributes called items, T be transaction that includes {customer-id, transaction-time, item-purchased }, si be an itemsets, which contains a set of items from I, S be a sequence that consists of an ordered list of itemsets s1, s2, · · ·, sn. No customer has more than one transaction with the same transaction time. (10,

20) · · · are itemsets, while (30)(90) · · · are sequences. All the items in the same round bracket have the same transaction time, while there is an order between round brackets within the large angle bracket Sequential patterns indicate the correlation between transactions while association rule represents intra transaction relationships. In association rule mining of transaction database, the mining results are about which items are brought together frequently, those items must come from the same transaction. While the result of sequential pattern mining are about which items are brought in a certain order by the same customer, those items come from different transactions.

Sequential pattern is a sequence of itemsets that frequently occurred in a specific order, all items in the same itemsets are supposed to have the same transaction-time value or within a time gap. Usually all the transactions of a customer are together viewed as a sequence, usually called customer-sequence, where each transaction is represented as an itemsets in that sequence, all the transactions are list in a certain order with regard to the transaction-time.

Support, a customer support a sequence s if s is contained in the corresponding customer-sequence, the support of sequence s is defined as the fraction of customers who support this sequence. Sequential pattern mining is the process of extracting certain sequential pat- terns whose support exceed a predefined minimal support threshold.

Support(s)= Noofsupport customers Total Noof customers Algorithms for Mining Sequential Patterns 63

–  –  –

Since the number of sequences can be very large, and users have different interests and requirements, to get the most interesting sequential patterns, usually a minimum support is pre-defined by users. By using the minimum support we can prune out those sequential patterns of no interest, consequently make the mining process more efficient.

–  –  –

However some sequential patterns that do not satisfy the support threshold are still interesting. Another metric called surprise has been introduced to measure the interestingness of those sequences in [Yang et al. 2001], a sequence s is surprising pattern if its occurrence differs greatly from the expected occurrence by treading every items equal. In the surprise metric the information gain in information theory was proposed to measure the overall degree of surprise, Sequential pattern mining approaches There exists a great diversity of algorithms for sequential pattern mining.. In this Section we first introduce some general and basic algorithms for sequential pattern mining, extensions of those algorithms for special purposes, such as multidimensional sequential pattern mining and incremental mining are covered later on.

Also periodical pattern mining is elaborated as an extension of sequential pattern mining.

General Algorithms for Sequential Pattern Mining. Most of the basic and earlier algorithms for sequential pattern mining are based on the Apriori property proposed in association rule mining.

[Agrawal and Srikant 1994], the property states that any sub-pattern of a frequent pattern must be frequent. Based on this heuristic, a series of Apriori-like algorithms have been proposed: AprioriAll, AprioriSome, DynamicSome in [Agrawal and Srikant 1995], GSP [Srikant and Agrawal 1996] and SPADE [Zaki 2001]. Later on another a series of data projection based algorithms were proposed, which includes FreeSpan [Han et al. 2000] and PrefixSpan [Pei et al. 2001]. SPADE [Zaki 2001] is a lattice based algorithm, MEMISP [Lin and Lee 2002] is a memory indexing based approach, while SPIRIT [Garofalakis et al. 1999] integrates constraints by using regular expression.

AprioriAll. Sequential pattern mining was first introduced in [Agrawal and Srikant 1995] by Agrawal, three Apriori based algorithms were proposed.

Given the transaction database with three attributes customer-id, transaction-tim e and purchased-items, the mining process were decomposed into five phases (Example of the first three phases are shown in Table I):

Sort Phase: the original transaction database is sorted with customer-id as the major key and transaction time as the minor key, the result is set of customer sequences.

Table I(a) shows the sorted transaction data.

L-itemsets Phase: the sorted database is scanned to obtain large 1-itemsets according to the predefined support threshold. Suppose the minimal support is 40%, in this case the minimal support count is 2, the result of large 1-itemsets is listed in Table I(b).



Pages:   || 2 |


Similar works:

«100blumen || Under siege [Tape / Kassette / MC] Professionell gemachte Tapeversion des neuen Blumen-Albums mit Booklet! Die unbelehrbaren Elektro-DnB-Punk-Asseln von 100blumen beehren uns ihrem Album UNDER SIEGE. Das unfassbar gut aussehende Blumenkrafttrio aus Düsseldorf ballert euch ihren unverkennbaren Stil, jene wilde Mischung aus elektronischen Elementen, punkigen und dann wieder crustigen Parts und dem alles, aber auch wirklich alles überrollenden Schlagzeug um die Ohren, dass es...»

«VTT WORKING PAPERS 82 ESPOO 2007 Please find updated report: http://www.vtt.fi/inf/pdf/tiedotteet/2009/T2493.pdf Design and operation of power systems with large amounts of wind power State-of-the-art report Hannele Holttinen & Bettina Lemström, VTT, Finland Peter Meibom & Henrik Bindner, Risø National Laboratories Antje Orths, Energinet.dk, Denmark Frans van Hulle, EWEA Cornel Ensslin, ISET; Albrecht Tiedemann, DENA, Germany Lutz Hofmann & Wilhelm Winter, E.ON Netz, Germany Aidan Tuohy &...»

«FORSCHUNGSUND LEHRBERICHT 2009/2010 FACHBEREICH 4: INFORMATIK FORSCHUNGSUND LEHRBERICHT 2009/2010 Forschungsund Lehrbericht 2009/2010 Fachbereich 4: Informatik Universität Koblenz-Landau November 2010 Impressum Herausgeber Fachbereich Informatik der Universität Koblenz-Landau Redaktion Manfred Jackel Fachbereich Informatik Postfach 201 602, 56016 Koblenz Mail: jbinf@uni-koblenz.de ISSN 1613-3897 Druck Druckerei + Verlag Dietmar Fölbach, Koblenz Auflage 830 Das Titelbild zeigt eine 3D-Karte...»

«Verfassungsschutzbericht 2012 Vorabfassung Inhaltsverzeichnis I. Verfassungsfeindliche Zielsetzungen 1. Rechtsextremismus 2. Linksextremismus 3. Islamismus und Ausländerextremismus II. Erscheinungsformen des Extremismus mit Auswirkungen auf den Freistaat Sachsen 1. Rechtsextremismus 1.1 Personenpotenzial 1.2 Neonationalsozialisten 1.3 Nationaldemokratische Partei Deutschlands (NPD) und ihre Jugendorganisation Junge Nationaldemokraten (JN) 1.3.1 Nationaldemokratische Partei Deutschlands (NPD)...»

«ABSTRACT POSTER BOOK 2013 Doctoral Education Conference Wednesday, January 23, 2013 6:00 7:30 pm American Association of Colleges of Nursing Doctoral Education Conference January 23 – 26, 2013 Hotel del Coronado, San Diego, California Poster Presentations Abusive Head Trauma in Infants and The Shape-Shifting Role of Crying Denise C. Abdoo, MSN, RN Co-Presenters: Madalynn Neu, PhD, RN Paula Meek, PhD, RN University of Colorado Denver Aurora, Colorado Educational Preparation for Nursing...»

«FINANCIAL ANALYST'S HANDBOOK I Methods, Theory, and Portfolio Management Edited by SUMNER N. LEVINE State University of New York Stony Brook, New York TECHNlSCHE HOCHSCHULE DARMSTADT Fachbereich 1 Gesamtbibliothek Betriebswirtschaftslehre Inventar-Nr. A.0LLZ11. Abste!!-Nr. Sachgebiete Homewood, Illinois 60430 DOW JONES-IRWIN Contents VOLUME I METHODS, THEORY, AND PORTFOLIO MANAGEMENT Introduction 1. Overview of Financial Analysis, William C. Norby. The Role of the Financial Analyst in the...»

«Recent progress on nonlinear elliptic and parabolic problems and related Abstract methods E. Norman Dancer (University of Sydney), Yihong Du (University of New England), Konstantin Mischaikow (Rutgers University), Peter Polacik (University of Minnesota), Xiaoqiang Zhao (Memorial University of Newfoundland)... October 7–October 12, 2007 1 Overview In recent years, significant progress has been made on the analysis of a number of important features of nonlinear partial differential equations...»

«Development of a Prescription Drug Management System (SmartPill) A Thesis In TCC402 Presented to The Faculty of the School of Engineering and Applied Science University of Virginia In Partial Fulfillment Of the Requirements for the Degree Bachelor of Science in Computer Engineering by Spence Green Other Group Members: Jonathan Kelley, Electrical Engineering Brad Pinney, Electrical Engineering March 23, 2004 On my honor as a University student, on this assignment I have neither given nor...»

«Pursuant to article 16 para. 3 of the German Securities Prospectus Act investors who have already agreed to purchase or subscribe for Notes issued under the Programme (as defined herein) before this Fourth Supplement (as defined herein) has been published shall have the right, exercisable within two working days after the publication of this Fourth Supplement, to withdraw their purchase or subscription orders, provided that the new factor arose before the final closing of the offer to the...»

«Christina Kessler Amo ergo sum Ich liebe, also bin ich Selbstrealisation Der Weg in eine neue Wirklichkeit Arbor Verlag Freiamt im Schwarzwald Ich widme dieses Buch den Meistern meines Lebens, meinem verstorbenen Vater und dem „Inder“.Copyright © der deutschen Ausgabe: 2002 by Arbor Verlag, Freiamt 1 2 345 Auflage 02 03 04 05 06 Erscheinungsjahr Die in diesem Buch zitierten Textstellen wurden behutsam der neuen deutschen Rechtschreibung angepaßt. Lektorat: Eva Bachmann Dieses Buch wurde...»

«Kieler Beiträge zur Filmmusikforschung, 4, 2010 / 2 Kieler Beiträge zur Filmmusikforschung, 4, 2010 / 3 Kieler Beiträge zur Filmmusikforschung, 4, 2010 / 4 Impressum Copyright by Kieler Gesellschaft für Filmmusikforschung, Kiel 2010 Copyright (für die einzelnen Artikel) by den Autoren, 2010 ISSN 1866-4768 Verantwortliche Redakteure: Tarek Krohn, Willem Strank Kieler Beiträge zur Filmmusikforschung ISSN 1866-4768 Herausgeber: Heldt, Dr. Guido (Bristol) Krohn, Tarek (Kiel) Lehmann M.A.,...»

«Sustainable Consumption Decisions: An Examination of Consumer Cognition and Behavior Dissertation Mag. Verena Maria Gruber ID: 0550856 PhD Advisors: Prof. Bodo B. Schlegelmilch, WU Vienna, International Marketing Management Prof. Michael J. Houston, Carlson School of Management, University of Minnesota Table of Contents List of Tables List of Figures Acknowledgement Abstract (English) Abstrakt (German) 1. Introduction 1.1. Background 1.2. Consumer Decision Making with Regards to Sustainable...»





 
<<  HOME   |    CONTACTS
2016 www.book.xlibx.info - Free e-library - Books, abstracts, thesis

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.