Overview
We are a team of researchers from the University of Bochum (Chair for Network and Data Security) specializing in document security, with a focus on the security of PDF (Portable Document Format) files. This project aims to investigate the usage and security of PDF documents that are publicly available on websites listed on the Tranco TOP rankings. Our goal is to understand how PDFs are being used “in the wild” and to identify potentially dangerous practices, such as the use of obsolete cryptographic algorithms and forbidden features within these files.
Objectives
- To crawl and download publicly accessible PDF documents from websites on the Tranco TOP list.
- To analyze these PDFs for a variety of security metrics, including but not limited to cryptographic strength and the presence of forbidden features.
- To estimate real-world usage patterns of PDF documents and their potential vulnerabilities.
- To inform and educate web administrators and the public about the potential risks associated with insecure PDF practices.
Crawling Practices
We adhere to the best current practices for web crawling, ensuring that we respect the website owners’ terms and conditions for accessing their content:
- Robots.txt Compliance: We obey the directives in the
robots.txtfiles of all websites we crawl. This ensures that we only access content that is allowed to be crawled. - Rate Throttling: To minimize the impact on the websites we visit, we throttle our crawling speed to avoid overloading servers or causing disruptions.
Opt-Out Policy
We understand that some website owners may not wish to have their domains included in our research. We maintain an Opt-Out list to accommodate such requests. If you would like your domain to be excluded from our study, please send an email to pdf-insecurity@lists.ruhr-uni-bochum.de, and we will promptly remove your domain from our scanning list.
Contact Information
For any questions, suggestions, or collaborations, please feel free to reach out to us at pdf-insecurity@lists.ruhr-uni-bochum.de.
By contributing to this project, you are helping to improve the state of document security, providing valuable insights that could lead to safer online environments for all users. Thank you for your interest and support.