So I've been cruising through the book Programming Spiders, Bots, and Aggregators in Java by Jeff Heaton, published by Sybex. I complained the other day about the author's writing style. Today, though, I come to tell you that the content is good.
So far, I've been learning about the http protocol, the way headers are sent and received, how to parse html, how to hadnle sockets, and more. I'm getting ready to bust into the section on building a high-volume spider using jdbc or odbc and threading. Maybe I'll even start running something relatively soon. The only problem is that I don't know what I'll do with the data yet.
As an aidse, all of this Java code means I need to brush up on my Java programming abilities... Better go get my Java book out...