Two algorithms, MD5 and SHA1, to each file producing a small unique hash number. Hash numbers are used to authenticate documents so they cannot be altered, and to identify duplicate files which are then removed. MD5 or SHA1 de-duplication is performed on the original set of data to ensure that only exact duplicates are removed.
PLS applies one of each file and populates a corresponding field in a new engineered database. A database field is populated with extracted metadata or information about the document including author, recipient, date created, date modified, and original path. This important step allows the text and metadata from all files to be instantly searchable in the database without manipulating or changing the original documents.
PLS extracts text from documents, pictures and PDFs that do not have extractable text, and corresponding engineered database fields are populated with the text results. The software “reads”the image and produces text with high accuracy which is imported and searchable within the engineered database.
Optical character recognition software is used on is populated with a link to a read-only copy of the original file (Native Production) or a link to a TIFF or PDF image of the document (TIFF Production). Auto-date features are disabled and hidden workbooks are revealed. Native production is particularly useful for excel spreadsheets which often include formulas which would not appear in a printed version of the document. TIFF production is useful for attorneys that do not have the proper native software to open the native files.
Unique identifiers and labels are applied to every page of each document and each native document using the Bates number system. TIFF images receive Bates numbering on the bottom of each page just like a paper production and Bates numbers are applied to native documents within the file name.