Skip to Content
[CAIDA - Cooperative Association for Internet Data Analysis logo]
The Cooperative Association for Internet Data Analysis
http://ddec.caida.org/help.pl >
Documentation

This page presents documentation for the BETA version of DDec, CAIDA's public DNS Decoding database, public interface.

Patterns

DDec understands two types of patterns: regexp and hostpat.

DDec's regexp is a subset of perl regexp (see tables below).

DDec's "hostpat" (hostname pattern) is a simpler pattern syntax designed specifically for matching hostnames. All legal lowercase hostname characters stand for themselves, and the most common special pattern expressions can be written with a single uppercase letter. No special characters are needed to match a literal string; e.g., hostpat "foo.bar.com" matches hostname "foo.bar.com" (and no other hostname).

Basic hostpat and regexp syntax:

meaning hostpat regexp note
any digit D [0-9]
any letter L [a-z]
any alphanumeric A [a-z0-9]
any hex digit X [0-9a-f]
any label character H [-a-z0-9] anything allowed in a hostname label
period . \.
any 1 char _ . "_" as in SQL "like" operator
any 0 or more chars % .* "%" as in SQL "like" operator
word boundary B \b [a-z0-9] on one side, [-.] or nothing on the other

Other more advanced syntax that works in both hostpat and regexp:

meaning hostpat/regexp note
char class: match "a", "b", or "c" [abc] may contain only -a-z0-9
negated char class: match anything but "a", "b", or "c" [^abc]
numbered grouping (pattern) used to capture a substring for decoding
unnumbered grouping (?:pattern)
alternation: match "com" or "net" com|net allowed only inside (...) or (?:...)
0 or more of last item *
1 or more of last item +
0 or 1 of last item ?
n repeats of last item {n}
n or more of last item {n,}
n..m repeats of last item {n,m}

Both regexps and hostpats are case insensitive.

Hostpats are always anchored, i.e. must match the entire string. To get the effect of an unanchored beginning, start the hostpat with "%". E.g., the hostpat "bar.com" matches only "bar.com", but the hostpat "%bar.com" also matches "foobar.com", "quux.bar.com", or any other hostname ending in "bar.com".

Rulesets

A ruleset is a collection of related rules and/or encodings for decoding a set of related hostnames (usually belonging to the same domain or organization). Here is an example of a simple ruleset containing 2 rules:

---
name: examplecorp
note: ExampleCorp, Inc.
rules:
- hostpat: <iata>D+.example.com
- hostpat: %.<clli>.example.net

DDec rulesets are displayed in a YAML format. Each YAML document (beginning with "---") contains one ruleset.

Every ruleset must have a name made of letters, digits, and underscores. An optional note can be used to describe the ruleset.

Rulesets used for decoding must define a list of one or more rules. The simplest type of rule contains just a pattern, either a hostpat or regexp. That pattern usually contains embedded <...> or <<...>> variable bindings that describe

  • a pattern to match a relevant substring of the hostname,
  • how to interpret that substring,
  • what variable to assign the result to.
Often, as in this example, these three attributes can be described by naming a single encoding.

For example, if the hostpat "<iata>D+.example.com" is matched against the hostname "lax42.example.com", then "<iata>" would match the 3-letter substring "lax", interpret it as an IATA airport code, and assign the result "Los Angeles" to the variable "loc".

Embedded variable bindings

Finer control over the pattern, mapping, or variable can be had by using more complex expressions, described below.

syntax match interpret with assign result to
<ENCODING> ENCODING's pattern ENCODING's mapping ENCODING's var
<VAR:ENCODING> ENCODING's pattern VAR's mapping or ENCODING's mapping VAR
<VAR=PATTERN> PATTERN VAR's mapping VAR
<VAR:ENCODING=PATTERN> PATTERN VAR's mapping or ENCODING's mapping VAR

The most commonly used variable is loc (geographic location).

The difference between <...> and <<...>> is that the latter matches only if the subpattern is not adjacent to a letter. So, for example, the hostpat "%<iata>.example.com" would match "ge7.dallas.example.com" and interpret "las" as a 3-letter IATA airport code for Las Vegas, which is probably not what was intended; but "%<<iata>>.example.com" would not match, because there is an "l" is adajcent to the "las".

Encodings

DDec defines a number of standard encodings, and rulesets may also define their own custom encodings.

Some standard encodings:

name description variable hostpat
iata IATA 3-letter airport code loc LLL
icao ICAO 4-letter airport code loc LLLL
locode UN/LOCODE loc LLAAA
pop PoP names loc L[L.]+
clli first 6 characters of CLLI code loc LLL[L-]LL

Reusable named custom encodings can be defined in an encodings entry at the top level of a ruleset, and an anonymous one-time mapping can be defined under a var entry in a rule. For example:

---
name: examplecorp
note: ExampleCorp, Inc.
encodings:
- citycode:
    mapping:
      la: Los Angeles, CA, US
      par: Paris, FR
      nyc: New York, NY, US
- airport:
    extends: iata
    mapping:
      lnd: London, UK
rules:
- hostpat: <citycode>D+.example.com
- hostpat: %.<airport>.example.net
- hostpat: %.<loc=L+>.example.org
  vars:
  - loc:
      mapping:
        chic: Chicago, IL, US
        bos: Boston, MA, US
        lond: London, UK

The rule for example.com uses a custom encoding named "citycode", which is defined earlier in the ruleset with custom codes for "la", "par", and "nyc". The rule for example.net uses the "airport" encoding, which is a custom encoding that has all the attributes of the standard "iata" encoding, with the addition of a nonstandard code for "lnd" that overrides the "lnd" code already defined by iata. Both of these named encodings could be reused by other rules.

The rule for example.org doesn't use a named encoding. Instead, it says that a substring that matches "L+" should be assigned to "loc", but first it should be looked up in the mapping defined for "loc" in the same rule.

Rule domains

A rule's pattern must indicate a specific domain or small set of specific domains that DDec can identify. This means regexps must end in "$" and hostpats must not end in "%". For example:

hostpat equivalent regexp domains
%.L+.foo.com \.[a-z]+\.foo\.com$ foo.com
%.L+.foo.(com|net) \.[a-z]+\.foo\.(com|net)$ foo.com, foo.net
%.L+.foo.net(.uk)? \.[a-z]+\.foo\.net(\.uk)?$ foo.net, foo.net.uk
%.L+.foo[1-3].com \.[a-z]+\.foo[1-3]\.com$ foo1.com, foo2.com, foo3.com
%.L+.foo.com% \.[a-z]+\.foo\.com (invalid)
%.foo.% \.foo\. (invalid)

Full ruleset syntax

The documentation above described only the most common syntax. The complete syntax for rulesets is described below.

---
name: name                        # name of ruleset (required)
source: source                    # where did ruleset come from
note: arbitrary additional information
hostpat: hostpat                  # pattern that must match hostnames
regexp: regexp                    # pattern that must match hostnames
encodings:
  ENCODING_NAME_1:
    source: where did encoding come from
    note: arbitrary additional information
    extends: ENCODING             # name of another encoding from which this encoding will inherit
    (OR)
    extends:
    - ENCODING1                   # name of another encoding from which this encoding will inherit
    - (...more encoding names)
    hostpat: hostpat              # pattern to match (unless overridden in <...>)
    regexp: regexp                # pattern to match (unless overridden in <...>)
    var: name                     # name of variable to assign to (unless overridden in <...>)
    mapping:
      .encoding: ENCODING         # optional encoding for re-interpreting VALUEs
      CODE1: VALUE1
      CODE2: VALUE2
      (... more mappings)
  (... more encodings)
rules:
- hostpat: hostpat                # pattern that hostname must match (hostpat or regexp is required)
  regexp: regexp                  # pattern that hostname must match (hostpat or regexp is required)
  note: arbitrary additional information
  mapping_required: 1                    
      # if mapping_required is "true" or "1", and a var binding has a mapping
      # but the value extracted from the hostname does not match any of the
      # mapping's codes, then the rule is treated as not matching.
  vars:
    - VAR1: VALUE
    (OR)
    - VAR1:
	value: VALUE              # string, possibly with $-substitutions
	encoding: ENCODING        # name of encoding to use to decode extracted value
	(OR)
	encoding:
	- ENCODING1               # name of encoding to use to decode extracted value
	- ... (more encoding names)
	mapping:
          .encoding: ENCODING     # optional encoding for re-interpreting VALUEs
          CODE1: VALUE1
          CODE2: VALUE2
          (... more mappings)
    - (... more vars)
- (... more rules)

The value in a variable binding can be a simple literal string like "San Diego", or may contain $-substitutions:

  • "$N", where N is a number, will be replaced with the part of the hostname matched by the Nth set of (...) or <...> in the pattern.
  • "$var" will be replaced with the value of variable named "var" (which must be defined earlier in the same rule).

In a rule, variable bindings may be embedded in the pattern or listed under "vars". E.g., these two rules are equivalent:

- hostpat: r<router=DD>.foo.net
- hostpat: r(DD).foo.net
  vars:
  - router: $1

A variable binding can even be both embedded and listed under "vars", which is useful if you need a mapping or multiple encodings, which can't be embedded:

- hostpat: <loc=LL>.bar.net
  vars:
  - loc:
    mapping:
      sd: San Diego, CA, US
      la: Los Angeles, CA, US

Encoding name syntax:

  • "/foo" refers to the global encoding "foo".
  • "foo" refers to the encoding "foo" in the current ruleset if there is one, otherwise the global encoding "foo".
  • "bar/foo" refers to the encoding "foo" in ruleset "bar".



  Last Modified:
  Page URL: http://ddec.caida.org/help.pl
Content-Type: text/html; charset=UTF-8