~rkta/h2g

HTML to gemtext converter

fcabe87 Avoid potential UB when int is more then 32 bits

~rkta pushed to ~rkta/h2g git

30 days ago

6ad93cc Avoid potential UB when int is more then 32 bits

~rkta pushed to ~rkta/h2g git

30 days ago
README

  h2g is a HTML to gemtext converter. It reads HTML from stdin and writes
  gemtext to stdout handling a subset of HTML elements and entities.

  The following HTML elements are recognized, the rest is ignored:
  * <a href=>
	A reference number is inserted instead of the link and the link is
	added to a list at the bottom of the document. Links to element
	identifier are ignored. In relative local links (starting with '.') a
	'.html' suffix is replaced with '.gmi'.
  * <b>
	Element is surrounded with '*'.
  * <br>
	A line break is enforced.
  * <em>, <i>, <u>
	Element is surrounded with '_'.
  * <h1> to <h6>
	Content is put on a single line and prefixed with the corresponding
	number of '#'. Block is enclosed with empty lines.
  * <img>
	Alt text is printed in place of the image and the source is added to
	the footnote link list.
  * <p>
	Block is enclosed with empty lines.
  * <pre>, <blockquote>
	Content is written as is, dropping leading and trailing empty lines.
	Block is enclosed with empty lines.
  * <table>, <tr>, <th>, <td>
	Tables are surrounded with empty lines. Each row is printed to a
	single line. A literal tab character is inserted between two <td>
	elements. <tr> is treated the same as <td>.
  * <li> inside <ol>
	Each <li> element is printed to a single line prefixed with a
	consecutively increasing number. Block is enclosed with empty lines.
  * <li> inside <ul>
	Each <li> element is printed to a single line prefixed with '*'. Block
	is enclosed with empty lines.
  * <s>
	For every word in the element a ^W is printed after the element.


CAVEATS

  * All input is ignored until a <body> element is found!


BUGS PATCHES FEATURE REQUESTS QUESTIONS INSULTS

  mail@rkta.de


EXAMPLE

Input:
------

<!DOCTYPE html>
<html lang="en">
<head>
<title>TITLE</title>
</head>
<body>
<header>
<H1>H1</H1>
</header>
<h2>H2</h2>
<p><s>A sentence</s>Paragraph <em>with</em> an <u>important</u>
<a href="./local.html"><b>local</b> link</a>.</p>
<img alt='alt text' src='./img.png'>
<pre>
	Pre-formatted
		text
</pre>
break<br>row
<ul> <li>List entry</li> </ul>
<ol> <li>Ordered list entry</li> </ol>
<table>
	<tr><th>Entity</th><th>Symbol</th></tr>
	<tr><td>&amp;amp;</td><td>&amp;</td></tr>
	<tr><td>&amp;apos;</td><td>&apos;</td></tr>
	<tr><td>&amp;gt;</td><td>&gt;</td></tr>
	<tr><td>&amp;lt;</td><td>&lt;</td></tr>
</table> </body> </html>


Output:
-------

# H1

## H2

A sentence^W^WParagraph _with_ an _important_ *local* link[0].

alt text[1]

```
	Pre-formatted
		text
```

break
row

* List entry

* 1) Ordered list entry

Entity	Symbol
&amp;	&
&apos;	'
&gt;	>
&lt;	<

=> ./local.gmi [0] local link
=> ./img.png [1] alt text