Face recognition tooling

#2 JavaSDR - broken dependency on SoapySDRJava

~phlash commented on SoManyBugs todo

22 days ago

#2 JavaSDR - broken dependency on SoapySDRJava

~phlash commented on SoManyBugs todo

25 days ago


some scripts to find and tag faces in photograph image files, 'cause Picasa is dead..



#What do all the scripts do?

This stuff:

  • schema.sql: SQLite3 database schema, use this to create an empty db
  • face_picasa: loads face_picasa table with data from .picasa.ini files, use with find like this: % find /my/photo/store -name .picasa.ini -print0 |xargs -0 ./face_picasa -d /my/database.db
  • face_weights: generates facial vector weights from loaded picasa data and image files
  • face_scanner_task: scans specified folder path(s), detects new faces and attempts to match with picasa data & labelled groups
  • face_regroup: re-runs matching to labelled groups with specified parameters (threshold distance)
  • face_cluster: re-runs matching to existing face clusters with varying weights / thresholds
  • tflite_test.py: test program to ensure I can run Tensorflow Lite as a pre-screening filter (much faster than Dlib CNN)

#Proper design time

As per my blog, it's time to think a little on information flow, integration points and process life cycles:

  • UX: labelled faces are available to use in any image manager app as additional image info.
  • UX: unlabelled faces are grouped and presented for labelling, possibly via extension to gThumb or standalone app.
  • UX: face detection and labelling takes place as soon as images are available (not just while viewer is active: a Picasa peeve).
  • UX: new unlabelled faces are notified via selectable channels (cron output?) so the humans can help.
  • UX: moving images around, re-naming folders etc. does not trigger re-labelling.
  • UX: labels do not change while viewing faces (this annoys me a lot in Picasa!), but can be refreshed.

These UX stories lead design towards the following principles:

  • Invariant image identification: hashes, not pathnames. This also has a beneficial side effect of duplicate detection.
  • Face labels stored with images: providing shared access (yet another Picasa peeve), with mappers or plugins to viewer apps.
  • Scheduled task to process images: (using find -ctime and last run marker?), hash, detect, label & notify unknowns.
  • Scheduled task to handle removal: 2-pass search using mark/release strategy, part of image processing task maybe?
  • Viewer adapters (mappers/plugins): can draw labelled boxes on image(s) in view, uphold no changes while viewing policy.

Early decisions:

  • Metadata in images or separate files?
    • Not all formats support metadata.
    • Editing source files is bad (invariance broken).
    • How to manage file movements if separate (via new images/removal processes, greatly assisted by hash identification, not paths)?
    • Metadata in SQLite, which points to a series of storage paths that form the corpus of images. Published schema :)
    • file hash <= file path mapping. Updated/created/removed by scheduled scanner. Index both columns for lookup.
    • file hash <= face data (rectangles, pickled vector of face descriptor values) mapping.
    • face data => face group mapping.
    • face group => face label mapping (supports multiple averages).
  • Scanner outputs:
    • New/Removed file paths & hashes, duplicates indicated.
    • New/Removed faces (by label/group/path).
    • New Unknown faces (by group/path).


Sorry - this is where it gets ugly (thanks Google, Debian, etc.), I am using the --break-system-packages option below to force installation in system environment (despite PEP686) as I'm happy this doesn't conflict with existing Debian-maintained things, YMMV:

  • Adam Geitgey's face_recognition package, in PyPI: pip3 install --break-system-packages face_recognition - compiling dlib is sloow.
  • Google Tensorflow Lite runtime tflite-runtime-nightly, in PyPI: pip3 install --break-system-packages tflite-runtime-nightly

I originally pulled the tflit package from PyPI, but that doesn't know how to check nightly things, so I now have the code in a sub-folder locally and have mangled it to avoid install checks: ick but it works (tm).