Module Todo: Document Metadata Extraction #717
Replies: 4 comments 12 replies
-
Note: it would also be nice to emit the text from these documents as a generic event consumable by |
Beta Was this translation helpful? Give feedback.
-
As a prerunner for this, I have written a proof-of-concept @nicpenning here is the module: You can use it like this: bbot -t evilcorp.com -f subdomain-enum -m filedownload Pairing it with the web spider can also be very effective: bbot -t evilcorp.com -f subdomain-enum -m filedownload -c web_spider_depth=2 web_spider_distance=2 |
Beta Was this translation helpful? Give feedback.
-
This is probably relevant to this discussion #907 (comment). Now there are As mentioned in the linked discussion that is a ML model to detect human passwords in several file formats. Perhaps more interesting though is it uses Apache Tika to extract the strings from
which we could then raise as |
Beta Was this translation helpful? Give feedback.
-
It would be useful to have a collection of modules that download documents (.pdf, .docx, etc.) and extract useful metadata such as usernames and internal domain names. Thanks to @pjhartlieb and @Sw3d1shPh1sh for requesting.
Also, per @nicpenning:
Would require:
EDIT: Possible sources of metadata-extraction logic:
Beta Was this translation helpful? Give feedback.
All reactions