Topic: Lexical anaysis
no photo
Wed 03/16/11 11:44 PM
Someone sent me a letter, recently. It has words that are mis-spelled in a unique and characteristic way, suggesting the person who gave me the letter lied to me about who authored it. (This person has lied before.)

I'm curious if there are any tools out there - preferably free tools - designed to analyze text and find other similarities? Like a forensics tool that determines a persons "lexical fingerprint" and then checks if a person is likely to have authored a document?

fobroth's photo
Thu 03/17/11 05:38 PM
Interesting...
I fed 'lexical fingerprint' to a search engine that didn't turn up any tools but did show that the idea is out there (on a quick look)
There are free lexical analyzers for linux. I suppose one could interpret the ouput for themselves or apply/ pipe through a bayesian filter for some stats, like what a spam filter does.
I'll try to remember to take a look through my software manager for something and let you know what I find, if somebody else doesn't come to the rescue.

Honestly, though, if both writings walk like a duck and the author has lied about being a duck...

This is where I get in trouble-
Or ask a woman. They suck at mind reading but are relatively uncanny at the 'gut feeling'. Trust the creamy gut.

no photo
Thu 03/17/11 10:32 PM

I fed 'lexical fingerprint' to a search engine that didn't turn up any tools but did show that the idea is out there (on a quick look)


I threw those words together casually, not thinking that anyone else would use that phrasing, necessarily. Turns out that 'linguistic fingerprinting' is close to what I had in mind, and that the study of "stylometry" is also related.

I can point to specific elements and say "this suggests the author was ----"; I wonder if there are tools to point out other similarities I've missed.


Honestly, though, if both writings walk like a duck and the author has lied about being a duck...


Yeah, the situation is pretty weird.

no photo
Fri 03/18/11 07:48 PM


I fed 'lexical fingerprint' to a search engine that didn't turn up any tools but did show that the idea is out there (on a quick look)


I threw those words together casually, not thinking that anyone else would use that phrasing, necessarily. Turns out that 'linguistic fingerprinting' is close to what I had in mind, and that the study of "stylometry" is also related.

I can point to specific elements and say "this suggests the author was ----"; I wonder if there are tools to point out other similarities I've missed.


Honestly, though, if both writings walk like a duck and the author has lied about being a duck...


Yeah, the situation is pretty weird.


or ask any high school or college english teacher they trade in spotting plargaism all the time - my mom was one - she could tell instantly if someone else had written a student's paper

it also has to do with vocabulary usage and style as another group of experts are HR and placement people who do hiring - expereinced ones can tell after about a 5 minute convo if the applicant wrote the document - just a thought

no photo
Sun 03/20/11 09:44 PM
Sweetest and Fobroth, thank you both for your input! Asking an english teacher is a great idea.

It just occurred to me as I was identifying the similarities in the text that someone, somewhere, has probably written software to do it better than I could.