|
>>lessonizing script you had? (that allowed you to show "just >>the links, m'am" on PAMS type joints? > >Actually this was ALL new code.
*starts slow clap*
The other projects only >looked at a thread's meta information, like authorship, >replies, views, etc. This code had to actually make sense of >dcforums awful, awful, awful HTML and try to parse out each >post as well as the thread's tree structure.
ahh. that's right. because the other one was pretty much summarizing what was already there, and you could use the links. but even if you started from scratch, i suppose the experience at least made you familiar w/ the approach you'd have to take.
>This was also an interesting case because the post was so >fucking huge. There are basically two options if you want to >use an existing library to parse HTML. You can use slow, >memory-hungry libraries to parse crappy HTML, or you can use >super fast and efficient libraries to parse perfect HTML. This >particular post completely broke my usual methods of parsing >crappy HTML, and of course the fast methods of parsing crappy >HTML weren't an option. > >So I had to create a completely custom pure regular expression >solution (read: FAST) to parse out a post's raw relevant data, >and then recombine that into Post objects. Or something like >that.
damb. i don't even fux w/ regexp's... i just kind of squint at them sideways and think, 'hmm, i should learn about this sometime in the distant future when its obsolete'.
> >Right now for each reply in a post I'm trapping: >author (name, but not ID) >title >message >post num >parent num > >Still need to get: >author id >time stamp
but having done the former, that's trivial, i suppose.
>>onliest thing you could do is link back (from your parsed >out >>copies of ?uest's replies) to the original poast so that, if >>they wanted, folks could hop over there to add they lulz and >>whatnot. but that's really not necessary. > >I'd need to trap the post id and thread id for that I think. >Probably not that hard. It looks like people are actually >using this shit. If I see it getting linked to or anything >I'll definitely do some upgrades.
obviously ?uest an nem have some kinda affinity for it, which is why it was resurrected from the archives. my guess is that some industry heads are *really* getting a kick out of this shit and they want ?uest to keep it going.
either that or ?uest recognizes this as his 'broke diaries'. b/c, when you think about it, that's how ang blew up. she was making what we now consider blog entries, but just serial diary poastings, and then Villard got at her (or she got at them), and she cleaned them up, added some intervening narrative and bam -- book.
at the very least, what you've done thus far, has just saved some poor intern from compiling every one of these anecdotes. to be honest, if i was ?uest, i'd pay you to write a generic version of this parser so that he could aim it at archived poasts, etc, and grep (for lack of a better word) out his myriad musings. there are probably three or four books of his (worth of material) floating around between here, the lessahn, artist, and prolly ptp.
he would NEVER have the time to sort through it all the old fashioned way. but if he had all his shit in one place...
matter of fact... have you ever seen freemind? (freeware mindmapping software).
if you wrote this same script so he (or anyone) could pick out all of his replies among a range of poasts, and then paste them into freemind.
then you could use freemind to drag around each of those headings into whatever organizational structure you chose (like a freeform outline).
then you basically have your chapter structure and about 75% of the text you need. you'd just export your tree view as your outline and then click each individual node of the mindmap to pull up the actual text (no different than how we click on a band or person's name in that poast to get to the story via your hypnogogics tool) and cut and paste that into the document.
instant book. just add water.
this tool would be hella site specific. but it could allow a fairly prolific writer/poaster to datamine their net presence with MUCH more efficiency than searching and clicking and cutting and pasting. i'm serious. if ahmir has any plans on writing books, it would be well worth hitting you off with a little change to do it in the manner i just outlined.
i mean, shit. he could do a book on soul train just by taking the same approach to THAT ginormous poast. let alone all of the other jewels he's dropped in various joints that he would *never* be able to scour the interwebs and find on his own(nor would any okp intern, including the most stannish cats he could hire to do it manually).
> >>this is a great reader. >> >>the other thing you could do which would be hot would be to >>change the cell shading or text color based upon a timestamp >>date. i imagine you crawl the poast to catch any new >updates. >>so that way, someone could look at this page, and see >>something in orange, and know that that's the equivalent of >DJ >>CLUE screamin in they ears. > >I really thought about a bunch of stuff like this - I wanted >to offer different sorting options, so you could sort by >recency (post number) or popularity (number of replies) or >alphabetically (as it is now). > >Then I was gonna shade cells on a range from white to blue or >something based on HOW popular or recent it was or whatever. >But by the time I cracked through dcforums horrible HTML I >just felt like getting something up quickly. > > that's basically all a tag cloud is doing, right? lol. folks feelings would be hella hurt if they shit was light light light LIGHT grey. and then somebody else's mention is hella DOARK. but a lolz and cosign based tag cloud would be dope.
(your zeitgeist joint was hype as hell, too).
peace & blessings,
x.
========================================= ** i move away from the mic to breathe in
|