Supercharge Your DokuWiki with HTML5 Microdata Magic

This technical guide demonstrates how to enhance DokuWiki with HTML5 Microdata support, improving search engine optimization and semantic web capabilities. The article explains the basics of HTML5 Microdata, its advantages over RDF, and provides practical implementation steps including template modifications and parser customizations. Through concrete code examples, it shows how to integrate schema.org vocabularies into DokuWiki's structure, resulting in more meaningful and SEO-friendly content organization.

3 Minutes reading time

Behold the masterpiece that AI hallucinated while reading this post:

"How Little DokuWiki Learned to Speak Search Engine Language"

(after I fed it way too many marketing blogs and memes)

Created using DALL-E 3

AI-Generated: How Little DokuWiki Learned to Speak Search Engine Language

Microdata overview

HTML5 is the new standard for webpage markups. It comes with some pretty cool new features like media support etc. But one of the best things is the Microdata support!

Now, what is Microdata exactly? Using Microdata we can classify HTML elements or regions with meta data like “This is the name of a person” or “Here comes the copyright information”, so the real semantic Web 2.0. This greatly helps search engines to crawl the page content and also enables us to improve the display of the webpage in the search engine results.

HTML5 Microdata supersedes the RDF standard, and Google for instance advises page authors to use Microdata instead of RDF.

So, now lets see some old HTML:

Jane Doe
<img src="janedoe.jpg" />

Professor
20341 Whitworth Institute
405 Whitworth
Seattle WA 98052
(425) 123-4567
<a href="mailto:jane-doe@xyz.edu">jane-doe@illinois.edu</a>

Jane's home page:
<a href="http://www.janedoe.com">janedoe.com</a>

Graduate students:
<a href="http://www.xyz.edu/students/alicejones.html">Alice Jones</a>
<a href="http://www.xyz.edu/students/bobsmith.html">Bob Smith</a>

And here is the HTML with embedded Microdata:

<div itemscope itemtype="http://schema.org/Person">
  <span itemprop="name">Jane Doe</span>
  <img src="janedoe.jpg" itemprop="image" />

  <span itemprop="jobTitle">Professor</span>
  <div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
    <span itemprop="streetAddress">
      20341 Whitworth Institute
      405 N. Whitworth
    </span>
    <span itemprop="addressLocality">Seattle</span>,
    <span itemprop="addressRegion">WA</span>
    <span itemprop="postalCode">98052</span>
  </div>
  <span itemprop="telephone">(425) 123-4567</span>
  <a href="mailto:jane-doe@xyz.edu" itemprop="email">
    jane-doe@xyz.edu</a>

  Jane's home page:
  <a href="http://www.janedoe.com" itemprop="url">janedoe.com</a>

  Graduate students:
  <a href="http://www.xyz.edu/students/alicejones.html" itemprop="colleague">
    Alice Jones</a>
  <a href="http://www.xyz.edu/students/bobsmith.html" itemprop="colleague">
    Bob Smith</a>
</div>

Basically it is the same markup, but with embedded Microdata. Cool!. If you want to read more about available Microdata tags, check www.schema.org.

Adding Microdata support to DokuWiki

Adding Microdata tags to DokuWiki is quite simple. DokuWiki comes already with an embedded templating engine, so we can create a custom template and add the Microdata. Here is an example of this webpage, which is also backed by DokuWiki and Microdata(taken from main.php template):

<?php
// must be run from within DokuWiki
if (!defined('DOKU_INC')) die();

?>
<!DOCTYPE HTML>
<html>
<head>
  .. Stripped ...
</head>
<body itemscope itemtype="http://schema.org/WebPage">

        ... Stripped ...

	<div id="contentdetail">
        <div itemprop="breadcrumb" id="trace">

        ... Stripped ...

        <div itemscope itemtype="http://schema.org/TechArticle"><?php tpl_content(false)?>

         ... Stripped ...

</body>
</html>

The tricky part is the generation of the Wiki content. Here we have to tweak the DokuWiki XHTML parsers, so the /inc/parser/xhtml.php file. Here i changed some to to embed the Microdata:

function header($text, $level, $pos) {
    global $conf;

    if(!$text) return; //skip empty headlines

    $hid = $this->_headerToLink($text,true);

    //only add items within configured levels
    $this->toc_additem($hid, $text, $level);

    // adjust $node to reflect hierarchy of levels
    $this->node[$level-1]++;
    if ($level < $this->lastlevel) {
        for ($i = 0; $i < $this->lastlevel-$level; $i++) {
            $this->node[$this->lastlevel-$i-1] = 0;
        }
    }
    $this->lastlevel = $level;

    if ($level <= $conf['maxseclevel'] &amp;&amp;
        count($this->sectionedits) > 0 &amp;&amp;
        $this->sectionedits[count($this->sectionedits) - 1][2] === 'section') {
        $this->finishSectionEdit($pos - 1);
    }

    // write the header
    $this->doc .= DOKU_LF.'<h'.$level;
    if ($level <= $conf['maxseclevel']) {
        $this->doc .= ' class="' . $this->startSectionEdit($pos, 'section', $text) . '"';
    }
    if ($level == 1) {
        $this->doc .= ' itemprop="name"';
    }
    $this->doc .= ' id="'.$hid.'">';
    $this->doc .= $this->_xmlEntities($text);
    $this->doc .= "</h$level>".DOKU_LF;
}


function section_open($level) {
    if ($level == 1) {
        $this->doc .= '<div itemprop="articleBody" class="level' . $level . '">' . DOKU_LF;
    } else {
        $this->doc .= '<div class="level' . $level . '">' . DOKU_LF;
    }
}

Now my DokuWiki creates SEO friendly semantic Web 2.0 pages. Cool!

Git revision: 2e692ad