Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Config File

Source

Sources can be defined as two type. A sitemap, which is a traditional sitemap that points to resources or a sitemap index that points to a set of sitemaps.

The other is a sitegraph, which is a pre-computed graph for a site.

Examples of their formats respectively are:

sitemap

 sourcetype: sitemap
  name: unidata
  logo: ""
  url: https://www.unidata.ucar.edu/sitemap.xml
  headless: false
  pid: https://www.re3data.org/repository/r3d100010355
  propername: UNIDATA
  domain: http://www.unidata.ucar.edu/
  active: false
  credentialsfile: ""
  other: {}
  headlesswait: 0
  delay: 0

sitegraph

- sourcetype: sitegraph
  name: aquadocs
  logo: ""
  url: https://oih.aquadocs.org/aquadocs.json
  headless: false
  pid: http://hdl.handle.net/1834/41372
  propername: AquaDocs
  domain: https://aquadocs.org
  active: false
  credentialsfile: ""
  other: {}
  headlesswait: 0
  delay: 0

Example config file

A complete configuration file follows. You can download the file here.

context:
  cache: true
  strict: true
contextmaps:
  - prefix: "https://schema.org/"
    file: "./schemaorg-current-https.jsonld"  # wget http://schema.org/docs/jsonldcontext.jsonld
  - prefix: "http://schema.org/"
    file: "./schemaorg-current-https.jsonld"  # wget http://schema.org/docs/jsonldcontext.jsonld
gleaner:
  mill: true
  runid: runX
  summon: true
summoner:
    after: ""
    delay:  # milliseconds (1000 = 1 second) to delay between calls (will FORCE threads to 1)
    headless: http://localhost:9222
    mode: full
    threads: 5
millers:
  graph: true
minio:
  address:
  port:
  ssl:
  accesskey:
  secretkey:
  bucket:
sources:
- sourcetype: sitemap
  name: unidata
  logo: ""
  url: https://www.unidata.ucar.edu/sitemap.xml
  headless: false
  pid: https://www.re3data.org/repository/r3d100010355
  propername: UNIDATA
  domain: http://www.unidata.ucar.edu/
  active: false
  credentialsfile: ""
  other: {}
  headlesswait: 0
  delay: 0
- sourcetype: sitegraph
  name: aquadocs
  logo: ""
  url: https://oih.aquadocs.org/aquadocs.json
  headless: false
  pid: http://hdl.handle.net/1834/41372
  propername: AquaDocs
  domain: https://aquadocs.org
  active: false
  credentialsfile: ""
  other: {}
  headlesswait: 0
  delay: 0