Context-sensitive, fractal fuzzy matching. A research idea I'm interested in pursuing, believing it suitable for use in fighting spam, detecting polymorphic shellcode (malware in general), forensic analysis, static analysis and infinitely many other applications.

  • Fuzzy matching within a "linguistic" alphabet: solved problem (bayes, winnowing hashes, n-grams, etc)
  • Actual problem domain: multilevel context (archives, PDF, mime mail, etc)
  • Solution: context-aware (fractal-based analysis?) multilevel encoding to common alphabet

See also ssdeep.