I’ve spent the last few days working on getting data out of Sharepoint Wiki, and its shocking. If you read the webservices API and believe it, things would be simple. Sadly, its got some pretty major bugs, and some pretty woeful architecture too.
The worst finding is that although Sharepoint lists have a webservice API to get versioned data, its broken – all versions of the MetaInfo return the text of the last revision. So I had to resort to brute force html GET’s and parsing the html to try to get the historical info.
Still, data gathered and saved – next week I’ll start trying to extract the valuable user written text from the masses of shoddy html (like in MS Word to html, every line is surrounded by the same 100 character css styles, setting font to Verdana etc.