~trs-80/org-id-update-external

Update Orgmode IDs externally and asynchronously, using find and grep.

5523524 README.md: Add todo states back into CUSTOM_IDs

8 months ago

4574ca1 Re-publish README.md with recent changes

8 months ago
  1. org-id-update-external
    1. Introduction
    2. Installation
    3. Configuration
    4. Usage
    5. FAQ
      1. Why use an external shell script?
      2. What about running this on Windows?
    6. Roadmap
      1. Add timestamps to messages
      2. Modify function org-id-ext-update-id-locations-nil to ask if user wants to run org-id-ext-update-id-locations manually
      3. Implement reverse lookup functionality (i.e., 'What links to here?')
    7. Changelog
    8. Hacking
      1. File, variable, and function names and how data flows through them
      2. External shell script on remote machine

img

#org-id-update-external

If you are viewing this as a Markdown file, and/or a web page, it is an exported version of the original file (README.org), which is best viewed in Orgmode.

#Introduction

Do you have a lot of Orgmode files, with many id: links throughout?

Annoyed by how long the function org-id-update-id-locations takes to run (especially given that Emacs is single threaded, and this is a blocking action)?

Even more annoyed that org-id-update-id-locations needs to open each and every one of your Orgmode files just to check them for IDs? And then has the nerve not to clean up after itself, instead leaving them laying around all over the place, like some sort of animal?!

If so, this package may be for you.

We replace1 the built-in function org-id-update-id-locations with one of our own, which asynchronously calls an external shell script, which in turn uses (GNU) find and grep to search through a given list of directories looking for IDs in any Orgmode files it finds.

#Installation

  1. The easiest way (considering potential updates) for the time being is probably just to clone this repo:

    ~/ $ cd git/ext
    ~/git/ext $ git clone https://git.sr.ht/~trs-80/org-id-update-external
    
  2. Then say something like the following in your init file:

    (add-to-list 'load-path "~/git/ext/org-id-update-external")
    (require 'org-id-update-external)
    

When you require (or load) this package, the built-in function org-id-update-id-locations should be replaced1 by our version org-id-ext-update-id-locations-nil which disables the former by simply returning nil.

  • N.B.: Make sure you load (or require) the org-mode package before this one, otherwise the above might not happen. Which would not be the end of the world, but it would not accomplish the raison d'être of this package, either.

#Configuration

At a minimum, you must set the following required variable (which see):

  • org-id-ext-update-script-file - (default nil) This must be set manually, because installation locations can vary.

There are also some other optional ones you may be interested in looking at (which see):

  • org-id-ext-file-skip-regex-list - (default '("sync-conflict-[0-9]\\{8\\}-"))

  • org-id-ext-input-dirs - (default '("~~/"))

  • org-id-ext-debug - (default nil)

N.B.: Although invoking the update function (org-id-ext-update-id-locations) from within Emacs is run asynchronously, ultimately it still calls the external shell script anyway. Therefore make sure you set that required option above (which may otherwise seem like it is only relevant to the shell script).

#Usage

Simply do one of the following:

  1. Invoke the function org-id-ext-update-id-locations from within Emacs (ultimately, this calls the below shell script anyway, although asynchronously and providing required arguments as set in variables for this package).

  2. Invoke the external shell script (org-id-update-external.sh) directly, either manually or via system scheduler (i.e., cron, SystemD timer, etc.).

    1. If you call the script with no arguments, it will output usage advice.

    2. This can even be done on a remote machine.

#FAQ

#Why use an external shell script?

  1. Doing it that way, there is no need to open tons of additional buffers in Emacs, and leave them hanging around.
  2. Because it is faster.
  3. Because it provides more flexibility:
    1. It may be invoked asynchronously, either by Emacs itself, or even by a cron job on your system.
    2. It can even be run on another machine (assuming the paths to the Orgmode files are the same).

The first two were probably the major annoyances driving the creation of this project, the third ended up being an implementation detail.

In fact, 3.2. was because of the author's particular use-case. My Orgmode files (amongst others) are on a network mounted (NFS) drive, and running find directly on the remote machine was 14x faster in testing.

#What about running this on Windows?

To say that I have no interest in supporting proprietary software would not be correct. I actually have a vehement disdain for doing so.

#Roadmap

In roughly descending order of priority, and with no promises as to when, I would like to someday implement the following:

#SOMEDAY/MAYBE Add timestamps to messages

#SOMEDAY/MAYBE Modify function org-id-ext-update-id-locations-nil to ask if user wants to run org-id-ext-update-id-locations manually

  • A second external shell script (or maybe modify the existing one with a new argument) to do a reverse lookup (i.e., 'What links to here?') search.
    • The current implementation follows the built-in function org-id-update-id-locations, which maps an association list of file names to IDs contained therein. Which is fine if you are looking for the target of a link (to go to it, like you do when following an Orgmode link).

    • However if your query is 'What links to here?' this mapping is of no use. We instead need to perform a different search – for links and not IDs – and put that into a different association list. Which is what we are talking about implementing here.

      • If we are careful about the implementation, we could maybe do it at the same time as doing the original search(?). I guess that implies being in the same function. I will have to think about any play with the implementation to see what works most efficiently.

#Changelog

  • 2023-04-04, 0.1.0
    • Initial public release.

#Hacking

This section is intended for people who want to modify, extend, use in novel ways, or simply just better understand how it works.

#File, variable, and function names and how data flows through them

There are two main (and distinct) steps:

  1. External shell script (org-id-update-external.sh, included) writes output to org-id-ext-intermediate-file.

    1. This may be invoked:
      1. Externally (as a shell script).

        1. This can even be done on a remote machine.
      2. By calling the function org-id-ext-update-id-locations from within Emacs.

        1. This function runs the shell script asynchronously. When that completes, the function org-id-ext-update-process-sentinel is called, which in turn calls org-id-ext-process-id-locations (continue below).
  2. The function org-id-ext-process-id-locations:

    1. Reads in the results of the external shell script (from org-id-ext-intermediate-file).

    2. Checks each file name against each regex in org-id-ext-file-skip-regex-list.

    3. The output of this function is appropriate for the variable org-id-locations, so we:

      1. Write the output to org-id-ext-output-file (which, by default is set to org-id-locations-file).

      2. Convert it to a hash table, and store it in the variable org-id-locations.

      3. N.B.: This is essentially what the original org-id-update-id-locations function does as well (in fact it was modeled after that).

    4. Optionally (when the variable org-id-ext-debug is non-nil), output some debugging buffers. See the docstring for a list of buffers which will be output.

#External shell script on remote machine

One of the main reasons the shell script is external is so that you can run it on some remote machine.

There are a couple caveats, however:

  1. The file paths must be the same.

    1. This works well if, like the author, you either:

      1. Sync some files across devices, but always at the same path location, and/or

      2. Mount remote network drives on various devices, but always at the same path location.

    2. Interestingly, if the paths are anywhere under your $HOME directory, this works even if your user name is different on the different machines.

      1. This is because, like the original org-id-update-id-locations, we also use the function abbreviate-file-name to shorten it down (e.g. /home/user/ becomes simply ~/) before pushing it into the variable org-id-locations.
  2. You must sync, copy, or somehow transfer2 the org-id-ext-intermediate-file across the network to the local machine, to then be imported/processed by the function org-id-ext-process-id-locations.

#Footnotes

1 By 'replace(d)' we mean via fset. The original is still there (no Orgmode functions were harmed in the making of this package, lol).

2 The author uses Syncthing for this.