We have implemented a system for a renowned Czech law firm that is able to process the contents of legal documents, automatically create an archive and enable intelligent search. The query for our search engine is a previously solved legal problem. We enabled the processing of thousands of documents that lacked any metadata. The challenge lies in the fact that our system has to work with document semantics. Processing using keywords words alone is not sufficient. We have delivered our know-how in the form of software licenses.
The law firm has an archive of more than 150,000 legal documents. These are contracts, orders, court rulings, and other legal documents. Our task is to process these documents, create a database for easy retrieval and provide automated summaries of the documents. This requires a system that can process the contents of the documents.
This task is not unique to legal firms. If we have a large number of documents with limited metadata, we will always face the same problem. An ordinary search is not sufficient. We need to preprocess documents. First, they are sorted by categories to provide quick orientation. Next, general information is identified and distinguished from descriptive data. Descriptive data (names, addresses, specification of the case, etc.) are used for referencing, general information is used for searching.
The resulting system then provides an effective search, sorting documents by categories and linking them according to the descriptive data.
The main benefit for the client is easy access to a huge amount of proprietary information. This information also constitutes the client's crucial business know-how. Effective and easy access to this know-how gives the client a significant competitive advantage.
The main challenge is to automate the pre-processing of a large archive of various documents. It is necessary to identify their structure, find significant parts and categorize them hierarchically. In the solution, we used our software libraries in combination with a standard search engine.
Conventional systems for document search work only with keywords. However, in order to work with the document structure, determine important parts and for categorization, keywords are not sufficient.
Our algorithms can effectively represent the importance of the key parts of documents. This significantly improves search results and navigation.
The user is provided with the combined results of our algorithm and a standard search engine. It can thus effectively find answers to queries. For example, to find the corresponding precedent court cases, the algorithm ignores descriptive data and retrieves cases similar in general conditions only.