A simple web spider with a desktop interface
Treasure Audit is a multi-platform desktop Web spider that allows users to crawl and search websites.
- Import a list of websites to crawl from a .txt file
- Export a list of matched pages
- Add a virtually unlimited amount of matching criteria to filter for content
- View pages with an HTML text viewer or HTML renderer
- Highlight matches within the HTML view
- Flexible interface optimized for webmasters and people building out sites
- Real-time auditing – no more caching from Google
- Linux, Mac, and Windows compatibility
Treasure Audit is available on GitHub as a standalone binary for Linux, OS X, and Windows. You can also choose to run it from the source!
Choose a schema and enter the URL of the website you want to crawl then click Crawl
Add your criteria to narrow down the list of matched pages.
After the pages are crawled, you can click on the individual pages to view their HTML, or you can choose to render the pages in a boxed web browser (without any assets like CSS, JS, or images)
If you want to see where your criteria matches within the HTML, you can enable highlighting by going to View > Highlight Matches, which will highlight your matches in green within the HTML viewer.
At which point you can scroll through the HTML viewer to find the match.
If you find a match that you don’t want to include, you can copy and paste it into the criterion box and choose to ignore it
You can open the page you’re viewing in an external web browser by either going to Menu > Edit > Open Page in Web Browser
which will open the page in your default browser.
This process is especially useful if you’re auditing your own site for content, and you want to edit every instance of an image, form, slideshow, plugin, or whatever else.
- Logo – Shamash Teran
- Icons – Cole Bemis
- GUI and Software – Marcelo Cubillos
Treasure Audit is provided under the GPL v3 License, learn more about it here.