Message 00413: linkcheckers
Hi -
For link checkers, this is the commercial package I've used in the past:
http://www.versiontracker.com/dyn/moreinfo/macosx/15878
I've also got this python thing to work:
http://sourceforge.net/projects/linkchecker/
Finally, this the w3c tool:
http://search.cpan.org/dist/W3C-LinkChecker/
In terms of data collection, here are just a few of things we might
want to look for:
1. are the links valid?
2. is the html valid? (for a statistical subset of the pages, perhaps)
3. is the css valid?
4. are the pages accessible? (e.g., alt tags on images, etc.)
Those are all basically crawls, so have to be within the context of
the domain, the total number of pages in that "area", etc.
It seems that each "audit" has a scope (that business of agency and
departments, domain names, etc.) and a series of results within that
scope. The trick is to be able to go beyond "38 broken links for www.dhs.gov
which has 1,038 pages" and keep this extensible so we can add new
audit techniques, either scripts or perhaps otherwise.
Some of the other things we might be interested in measuring:
1. is there ftp service? rsync service?
2. nmap: what is the os of the target system or the target system(s)
and/or firewalls? Is it "current"? Are there "bad" open ports?
3. is there a privacy statement?
4. is the site usable? Perhaps 4 designers all look at a site and
give it a grade plus some comments.
5. does a panel of experts feels the site meets the 8 principles?
Finally, I would think there is some kind of descriptive information
about the target: name of the agency, who is the chief privacy
officer, what is their address to write to with complaints, etc.
At the end of the day, I have a series of scripts that I run
periodically, which generate these various metrics. Then, a reporting
script would go in and somehow create lists, rankings, and tables.
For example, for each subdirectory in my master directory, make a
table listing the name of the agency, the absolute number of bad
links, the percentage of bad links, the number of css errors, and the
panel of experts metric for usability. Then, calculate some overall
ranking based on weighting of those individual metrics.
I think I'd start with the w3c tools and look and see if there are
alternatives that seem to be used and useful.
The primary goal here is to perform the audit. But, the secondary
goal is to have the scripts/tools work in a way that other folks might
be able to use them on their own targets. If we can meet both goals,
that would be great. If not, this has to be at least simple enough
that I can figure it out and run it.
Carl