http://Ka.rsten-Winkler.de |
Home » hypKNOWsys » Project WUM » Syntax of MINT |
A MINT query SELECTs a template containing a list of variables and wildcards. The variables must be bound to such values that the predicates in the whereClause are satisfied.
query ::= 'SELECT' selectList fromClause [whereClause]Query structure: MINT V1.2 only supports the output of the template specified in the fromClause without further manipulation.
selectList ::= templateVarSpecifying the query template: A template consists of node variables and wildcards.
fromClause ::= 'FROM' nodeRef ',' templateRefThe special symbol # denotes the beginning of a sequence. So the template a * b is different from # a * b, in that variable a may appear anywhere in the first template, while it must appear at the sequence beginning in the second template. The wildcards in MINT V1.2 can be annotated with constraints:
wildcards ::= '*' | '['low_boundary';'high_boundary']'The wildcard * stands for zero or more events, if it appears by itself. If a constraint interval is used as a wildcard, the permissible number of events must belong to this interval. Note that the interval is closed at both ends. This means that infinity is a legal high boundary value. All elements of a template must be separated from each other using a blank space, e.g. # [0;1] a * b.
Introducing predicates: This is SQL-like syntax, but the queries are applied to groups of sequences.
whereClause ::= 'WHERE' condition ('AND' condition)*A numeric expression must involve at most two column references and must be linear. The columns currently supported are:
url: Identifier of an event. However, a url may appear more than once in a sequence, since a user may revisit some web pages. An event in a sequence is uniquely identified by the url and the occurrence (below).
occurrence: Occurrence of the event in the sequence. This number increases if an event occurs more than once in a sequence.
support: The number of sequences where the event appears, in the context of events bound to node variables and preceding it in a sequence.
accesses: Total number of sequences where the url occurs, independently of occurrence number and preceding events.
In the current version, numeric expressions involve at most two column references and are always linear. Column names and string operators must be typed in lower case letters.
For the following examples of MINT queries, consider this small demo web site:
After importing this access log file into WUM, the users' sessions (threshold: 30 min. maximum session duration) were determined and afterwards aggregated into this aggregated tree:
Have a look at the following example MINT queries:
Which paths of length between 1 and 5 lead to a node of
support more than 5?
select t
from node as a, template # [1;5] a as t
where a.support > 5
Note that a cannot be the first node of the
path. From the results, you can see that for any value of
a, the output aggregate tree is comprised by more
than one paths, none of which was traversed more than 5 times.
Which paths do visitors use to get from X.html to Y.html?
select t
from node as a b, template a * b as t
where a.url = "X.html"
and b.url = "Y.html"
Try templates containing a cycle!
select t
from node as a b, template a * b as t
where ( b.support / a.support ) > 0.3
and b.occurrence = 2
Please make sure that your MINT queries conform to the following syntax rules:
Blank spaces between operators, parentheses, etc. are compulsory!
All elements of the MINT syntax must be written in lower case letters!
The character # stands for the root node of the Aggregated Log.
Top of the Page • Legal Notice | December 3, 2004 |