It downloads the accepted papers available in EMAS 2019 page. Each paper in PDF is converted to a plain text using Apache Tika. Then using Google Universal Sentence Encoder they are vectorized. These vectors are compared creating correlations. The correlations vary from 0 to 100% of similarity. Those values are presented in an Altair correlation matrix.
I use it to find correlations across many papers and books I use in my researches. Since I use Mendeley, all of them are in a plain folder. The project called text-correlation retrieves all documents from a local folder creating a correlation matrix
n x n in a
.csv file. It is better for larger number of documents and suitable to open in a spreadsheet processor.