I'm trying to export out our attached documents in the T_CUST_DOC doc_contents field, and having some trouble. To test, I exported one out using BCP, but adobe gave me an error "Acrobat could not open '762bcp.pdf' because it is either not a supported file type or because the file has been damaged;
I read in another message that these attached documents are stored as OLE Objects with a header. I noticed the size difference between my output file and one manually extracted file, and wrote a quick python script to export out the binary file and strip off the first 1437KB. It created a file of the same size as the original, but adobe still won't open it and a file compare shows they different.
So any advice here on how to export these images? They're taking up too much space and we've decided to let them live outside of Tess.
Thanks,
John
The OLE header may vary in length, but is always within the first 300 characters of the file. Further, it varies in length and position by file type. Simply removing data from the start of the file to make sizes match won't do the trick, I'm afraid.
This article gives a good example of a function used to extract the root binary file and eliminate the OLE header, for various graphic file types. It might be easy to locate the corresponding data in your PDF files and modify the function to work for those as well:
http://blogs.msdn.com/b/pranab/archive/2008/07/15/removing-ole-header-from-images-stored-in-ms-access-db-as-ole-object.aspx
Hope this helps!