We have implemented our algorithms for understanding the contents of documents written in natural language in Pathevo, an application from American company Owen Software which facilitates planning for education and professional careers. We carried out a feasibility study, tuned our algorithms for commercial application and delivered our know-how in the form of contractual research and a subsequent licensing agreement.
We evaluate the extent to which subjects and courses listed by various schools have similar contents and levels. For example, we are able to determine whether language courses are taught in the same language, if it is the same type of teaching, and if it is the same difficulty. Our know-how does not rely only a keyword search, but on understanding the content of the natural text. The algorithm understands the content of the text sufficiently so that it is able to evaluate the similarity of courses on a scale of 0% to 100% with a success rate comparable to that of a qualified human.
We have all at some time or another had to decide which school to apply for and which subjects to take with a mind on our future career. We look for the best path to lead us through the educational and career processes. Such a planning service is provided by the Pathevo application from Owen Software.
For the applicant it usually means studying the subject and course descriptions from different schools and universities and, on the basis of their existing qualifications and aims, writing a study and career plan.
Our task was to automate this process. Using our algorithms for understanding natural text we enable computers to understand course descriptions, evaluate their similarities and help the applicant to choose the correct study programme.
The main challenge was to automate the processing of a huge amount of information in the form of free text. This means working with tens of thousands of course descriptions. We used the results from our research in the form of a software library which we had already prepared for implementation in product innovation.
Why is it such a difficult task? The information is widely available on the internet, but it is written in natural language without a prescribed structure and there is a huge amount of it. For example, in the USA there are around 420,000 real job offers, 80,000 internships, 47,000 vocational schools and universities and 15,000 options for financing studies.
A conventional search using keywords is not enough as it does not compare the contents and levels of courses, which is information that we need for creating a study plan.
This is why we used our algorithms for working with semantics. For example, we are able to recognise similar Spanish language courses of similar difficulty at different schools. The algorithm creates its own interpretation of what a Spanish language course description looks like and understands the contents of the description.
This is the method used in the commercial Pathevo system. We measured the success rate of the system and achieved the same results as an expert who evaluated the similarities manually.