[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Message 00413: linkcheckers

Hi -

For link checkers, this is the commercial package I've used in the past:


I've also got this python thing to work:


Finally, this the w3c tool:


In terms of data collection, here are just a few of things we might want to look for:

1. are the links valid?
2. is the html valid?  (for a statistical subset of the pages, perhaps)
3. is the css valid?
4. are the pages accessible? (e.g., alt tags on images, etc.)

Those are all basically crawls, so have to be within the context of the domain, the total number of pages in that "area", etc.

It seems that each "audit" has a scope (that business of agency and departments, domain names, etc.) and a series of results within that scope. The trick is to be able to go beyond "38 broken links for www.dhs.gov which has 1,038 pages" and keep this extensible so we can add new audit techniques, either scripts or perhaps otherwise.

Some of the other things we might be interested in measuring:

1. is there ftp service?  rsync service?
2. nmap: what is the os of the target system or the target system(s) and/or firewalls? Is it "current"? Are there "bad" open ports?
3. is there a privacy statement?
4. is the site usable? Perhaps 4 designers all look at a site and give it a grade plus some comments.
5. does a panel of experts feels the site meets the 8 principles?

Finally, I would think there is some kind of descriptive information about the target: name of the agency, who is the chief privacy officer, what is their address to write to with complaints, etc.

At the end of the day, I have a series of scripts that I run periodically, which generate these various metrics. Then, a reporting script would go in and somehow create lists, rankings, and tables. For example, for each subdirectory in my master directory, make a table listing the name of the agency, the absolute number of bad links, the percentage of bad links, the number of css errors, and the panel of experts metric for usability. Then, calculate some overall ranking based on weighting of those individual metrics.

I think I'd start with the w3c tools and look and see if there are alternatives that seem to be used and useful.

The primary goal here is to perform the audit. But, the secondary goal is to have the scripts/tools work in a way that other folks might be able to use them on their own targets. If we can meet both goals, that would be great. If not, this has to be at least simple enough that I can figure it out and run it.