You need to be able to reproduce your open source investigations. Here’s why. And how.

Read Time 7 mins | Written by: Intlabs

Photo by Mediamodifier on Unsplash

If you take one thing away from reading this blog post, it is that you need to discover and codify your baseline and minimum method so that you are able to reproduce your open source investigations. But what is a minimum method? And how does it relate to OSINT?

Your open source investigations need to be reproducible

Open Source Intelligence (OSINT) sources like social media platforms, news, public records and databases, and other publicly available information have become the cornerstone of modern research. Why? These data sources offer immense variety, making them the best way to corroborate proprietary sources. A single data source can have myriad use cases, from understanding user sentiment and impact to conducting deep forensic investigations. The near real time nature of open source data sources also makes them critically important in researching cases that are actively ongoing or evolving. However, and very importantly, the versatility and open access nature of these kinds of data sources also makes them dangerous – it can be challenging to establish their credibility, biases, and legal basis.

"Without a reproducible investigative method, OSINT is nothing more than a bunch of cool tools that deep dive for data.”

- Craig Pederson, Head of Digital Forensics at TGC

Just like any scientific research, the credibility and value of open source investigations hinge on their reproducibility. Achieving reproducibility, however, in open source investigations has both obvious and less obvious challenges.

Because of the dynamic nature of online content, data can change or disappear. Influence of platform-specific algorithms might alter how that data is presented, biasing lines of investigation and conclusions from case to case. Retention of data can be tricky, ensuring privacy and residency of potential personally identifiable information (PII). This is where a reproducible methodology or framework comes in to help demonstrate that findings are verifiable, conclusions are reliable, and methodologies are transparent and unbiased.

You need a baseline

A baseline is essentially a standard or reference point against which all subsequent data can be compared and analyzed. In the context of OSINT, this involves understanding the normal patterns, behaviors, or conditions in a given scenario before any unusual activity or anomalies are identified. With a properly established baseline, a negative result (data missing) can sometimes be just as helpful as a positive result. Determining normal patterns for sources can help you establish authenticity in the future, as misinformation typically does not follow normal (or ‘organic’) patterns. And you can use baselines to help frame the intended use of your data source. With open source data often used for new and unexpected purposes, it’s essential that you frame the original methods and purpose so that you can understand and be mindful of the limitations of the data.

You need to capture and store data properly

This process is not just about accumulating data. You need to strategically gather relevant and actionable intelligence. In OSINT, data capture can range from simple tasks like screenshotting social media posts to more complex activities like scraping websites or capturing satellite imagery. No matter what the source is, documenting your method of capture and providing a consistent basis for collection and analysis is your best defense against biases, misinformation, and potential legal issues with retention down the line. The legality of data capture methods, especially in terms of user privacy and terms of service of websites, is a vital piece of the puzzle and needs to be navigated carefully. Because while information may be publicly available, the way you captured and/or use it can have massive legal implications for your business.

The best way to ensure that you’re complying with data protection laws is to establish and then stick to rigorous standards for capture and retention. In practice, this means ensuring that the scope of your capture is tight and reasonable (and this means you must document your reasoning of handling the data each time you do it). For retention, tokenization or hashing – so that links between data can be preserved without including identifying information – are great methods for preserving privacy. While this can be achieved with a notebook, a file naming convention, and a great deal of discipline, more realistically, most investigators employ a minimum method, which is a combination of procedures, tools, and policies that are followed for the capture of data in each case.

What’s your minimum method?

  • What sources are you using?
  • What is the baseline?
  • What was the intent of the source?
  • What are the limitations?
  • What is your procedure to capture your data?
  • What tools are you using?
  • Can that procedure be repeated exactly and yield the same results?
  • If it doesn’t - how are you going to demonstrate and store the results of your capture?
  • What is the scope?
  • How are you protecting sensitive data and preserving privacy?

Why Intlabs?

We’re passionate about helping analysts build repeatable flows for their OSINT analysis. Our platform, ORIGIN, helps you capture, retain, redact, and share information from a huge variety of sources, ensuring that you and your team stick to your minimum method.