Just Another Kilo Biting Blog | Just another WordPress weblog

Sep/06

3

Algorithms for data processing

    Sien day, i’m working on an algorithm to compare html template similarity using the HTML DOM tree, its been nearly 80 hours for thinking and brainstorming, and it seems this method is getting dimmer than ever.

    What i’m trying to build is a Search Engine, from a crawler to result page. Kinda stuck somewhere at the data processing. Stripping html, structuring groups of data and identify the key topic of a page is not as easy as it seems. All were PROBLEMS!!!!!

No tags

1 comment

  • kowkaybin · September 3, 2006 at 11:46 pm

    I’m having ~20k pages of vBulettin board for data samples, now running a word analysis program to process the data, trying to sample the frequency of the words with these data, i hope i can get:

    1. Template Words (which shall not be evaluated on query)
    2. Grouping of words (synonyms perhaps you can call)
    3. *Spelling correction based on some algorithm that i have not came out with

Leave a Reply

<<

>>

Theme Design by devolux.nh2.me