Tuesday, March 27, 2012

Difference between indexing txt or word doc files, help!

Hi,
I'm putting together a system and one of the requirements is to have a
searchable CV function.
I've got all the code to load the files on to the image fields, I've indexed
and got it kinda working.
Before I go to far down the road what is your opinion on having txt files
instead of doc files held on the table search? The SQL seems to be more
flexible on
searches rather than on the binary files and the index files themselves are
smaller
..i.e. when I tried a like clause it told me this would only work against a
varchar field
(i'm thinking this may be a schoolboy error so forgive me)
My main concern is I'd have to do the text conversion automatically, any
pointers on this?
Does anyone have any views on the best way to go about this or views on
holding and full searches again word files
Many thanks for any help you can give
Jim Florence
Text means faster indexing times, but not by much. With text you can query
the columns and read the contents, you can't do this with binary.
Search SQL is equally as flexible with text and binary. You can only do a
like against text or char columns.
To do the conversion use filtdump -b (you can get this from the Platform
SDK), or you can use ole-automation against the word documents to extract
the text paragraph by paragraph.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Jim Florence" <florence_james@.hotmail.com> wrote in message
news:5IKdncUNWPRLowLeRVnyiQ@.pipex.net...
> Hi,
> I'm putting together a system and one of the requirements is to have a
> searchable CV function.
> I've got all the code to load the files on to the image fields, I've
> indexed
> and got it kinda working.
> Before I go to far down the road what is your opinion on having txt files
> instead of doc files held on the table search? The SQL seems to be more
> flexible on
> searches rather than on the binary files and the index files themselves
> are
> smaller
> .i.e. when I tried a like clause it told me this would only work against a
> varchar field
> (i'm thinking this may be a schoolboy error so forgive me)
> My main concern is I'd have to do the text conversion automatically, any
> pointers on this?
> Does anyone have any views on the best way to go about this or views on
> holding and full searches again word files
> Many thanks for any help you can give
> Jim Florence
>
>
|||Hilary,
Many thanks for that, very, very useful. I've started playing with the
indexing service as well to try and find a best fit.
I'll give this a go
many thanks for such a quick and informative response
Regards
Jim
"Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
news:%236F1hKaAGHA.216@.TK2MSFTNGP15.phx.gbl...
> Text means faster indexing times, but not by much. With text you can query
> the columns and read the contents, you can't do this with binary.
> Search SQL is equally as flexible with text and binary. You can only do a
> like against text or char columns.
> To do the conversion use filtdump -b (you can get this from the Platform
> SDK), or you can use ole-automation against the word documents to extract
> the text paragraph by paragraph.
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
> "Jim Florence" <florence_james@.hotmail.com> wrote in message
> news:5IKdncUNWPRLowLeRVnyiQ@.pipex.net...
>

No comments:

Post a Comment