~ols/veri

FOSS search engine
9 months ago
9 months ago

#Veri

The FOSS search engine

You can see an early dev edition of this running at veri.ols.wtf

#What is Veri?

Veri aims to be a FOSS search engine that can be deployed by anyone to scrape, index, and search a given subset of URLs and their immediate connections.

The workflow of Veri will be as follows

  1. A list of "tier 1" URLs is kept up to date by the instance operator
  2. Veri will periodically scrape those URLs to get a list of "tier 2" URLs
  3. Veri will periodically scrape the entire list of URLs to obtain page contents and metadata, to be stored in a database

It currently looks a little like this

% cat > start-links
https://ols.wtf/link.gmi
https://webring.xxiivv.com
^D
% cat start-links |while read link ; do veri-links $link ; cat links ; done |sort |uniq >all-links
% veri-scrape all-links
% veri-index results/ programmer
2020/12/11 17:30:30 programmer
7 http://fragmentscenario.com/ fragment scenario fragment scenario records and processes fragments of human life and its surroundings
21 https://2d4.dev/tw.txt
64 https://electro.pizza/twtxt.txt
72 https://feed.amorris.ca/hallway.txt
85 https://gueorgui.net/feed.xml Gueorgui Tcherednitchenko
93 https://iko.soy ilyakooo0 если что-то не так, то напишите мне плиз
108 https://longest.voyage Longest Voyage Jamie Crisman's projects and occasionally updated blog
149 https://patrikarvidsson.com Patrik Arvidsson Patrik Arvidsson – Personal wiki engine
155 https://resevoir.net resevoir・index foliage in south England
164 https://roytang.net/index.xml EVERYTHING on roytang.net
178 https://teknari.com Teknari PHOTOGRAPHY
186 https://twitter.com/heyitsols  Something went wrong, but don’t fret — let’s give it another shot.
204 https://www.johannesg.com The Portfolio of Jóhannes G. Þorsteinsson Send me an e-mail, connect with me on Mastodon, or maybe even on Twitter.
211 https://www.mentalnodes.com Index I created this public notebook / digital garden because I believe the only way to learn in public is to build in public.
217 https://www.romainaubert.com/twtxt.txt
224 https://xuv.be xuv.be [xuv = exuvie / exuvia / exuvium] Exoskeletons of Julien Deswaef. Portfolio of projects.
226 https://xvw.github.io/atom.xml xvw - planet
230 https://zvava.org zvava.org

#Why "Veri"?

There are two reasons why the name veri was chosen:

  • veri is the Latin word for truth, reality, or fact
  • veri is the Turkish word for data

#About the Project

#Goals

The goals of the project are as follows:

  • To be deployable by anyone to create their own specific-interest search engine
  • To have an understanding of www, gemini, and gopher schemes
  • To be modular, so that any of the individual components can be deployed without the others
  • To be a good citizen of the Internet, respecting robots.txt and configurable User-Agent to provide contact details for the instance

#Components

The majority of these are either a work in progress or non-existent

#veri-links

Generate a list of links from a given list of links.

It will have various flavours, albeit with as much shared code as possible, which are:

  • veri-links-www
  • veri-links-gemini
  • veri-links-gopher

The results will be written to a database, along with whether the link was a direct link or a discovered link

#veri-scrape

For scraping a list of links provided to extract content and metadata, including:

  • URL
  • Title
  • Author
  • Content length
  • Summary
  • HTML (for www sites) content
  • Plain text content

It will have various flavours, albeit with as much shared code as possible, which are:

  • veri-scrape-www
  • veri-scrape-gemini
  • veri-scrape-gopher

The results will be written to a database

#veri-index

An indexer that will create a Full-Text Search-capable inverted index from the database of entries

#veri-search

For retriving ranked entries

#veri-web

A way of submitting a search query and displaying results on the web

#veri-gemini

A way of submitting a search query and displaying results over gemini

#veri-gopher

A way of submitting a search query and displaying results over gopher

#veri-proxy

A way to view a site through the veri search interface, rather than visit directly, that is able to proxy www, gemini, or gopher content to any of the other protocols.

#Get involved

Post on the ~ols/veri-discuss mailing list for discussion or send patches to the ~ols/veri-devel list.

#Roadmap

Until such time as sr.ht has a nice Kanban board feature, you can see the roadmap at Trello


Full documentation (WIP) is here

Logo re-coloured from here, shared under CC BY-SA 4.0