Context-sensitive, fractal fuzzy matching. A research idea I'm interested in pursuing, believing it suitable for use in fighting spam, detecting polymorphic shellcode (malware in general), forensic analysis, static analysis and infinitely many other applications.
- Fuzzy matching within a "linguistic" alphabet: solved problem (bayes, winnowing hashes, n-grams, etc)
- Actual problem domain: multilevel context (archives, PDF, mime mail, etc)
- Solution: context-aware (fractal-based analysis?) multilevel encoding to common alphabet
See also ssdeep.