You need to be able to reproduce your open source investigations. Here’s why. And how.
Read Time 4 mins |
Written by: Intlabs
If you leave with one takeaway from this post, it is that you need to discover and codify your baseline and minimum method so that you are able to reproduce your open source investigations. What is a minimum method? How does it relate to OSINT, read on to learn more.
Your open source investigations need to be reproducible
Open Source Intelligence (OSINT) sources like social media platforms, news, public records and databases, and other publicly available information have become the cornerstone of modern research. Why? These data sources offer immense variety, making them the best way to corroborate proprietary sources. A single data source can have myriad use cases, from understanding user sentiment and impact to conducting deep forensic investigation. The near real time nature of open source data sources also makes them critically important in researching cases that are actively ongoing or evolving. However, and very importantly, the versatility and open access nature of these kinds of data sources also makes them dangerous – it can be challenging to establish their credibility, biases, and legal basis.
“Without a reproducible investigative method, OSINT is nothing more than a bunch of cool tools that deep dive for data.” - Craig Pederson, Head of Digital Forensics at TGC
Just like any scientific research, the credibility and value of open source investigations hinge on their reproducibility. Achieving reproducibility, however, in open source investigations has both obvious and more covert challenges. Because of the dynamic nature of online content, data can change or disappear. Influence of platform-specific algorithms might alter how that data is presented, biasing lines of investigation and conclusions from case to case. Retention of data can be tricky, ensuring privacy and residency of potential personally identifiable information (PII). This is where a reproducible methodology or framework comes in to help demonstrate that findings are verifiable, conclusions are reliable, and methodologies are transparent and unbiased.
You need a baseline
A baseline is essentially a standard or reference point against which all subsequent data can be compared and analyzed. In the context of OSINT, this involves understanding the normal patterns, behaviors, or conditions in a given scenario before any unusual activity or anomalies are identified. With a properly established baseline, a negative result (data missing) can sometimes be just as helpful as a positive result. Determining normal patterns for sources can help you establish authenticity in the future, as misinformation typically does not follow normal (or ‘organic’) patterns. You can use baselines to help frame the intended use of your data source. As open source data is often being used for new purposes and unexpected purposes it’s essential that you frame the original methods and purpose so that you can understand and be mindful of the limitations of the data.
And you need to capture and store data properly
The best way to ensure that you’re complying with data protection laws is to establish and then stick to rigorous standards for capture and retention. In practice, this means ensuring that the scope of your capture is tight and reasonable (and this means you must document your reasoning of handling the data each time you do it). For retention, tokenization or hashing – so that links between data can be preserved without including identifying information – are great methods for preserving privacy. While this can be achieved with a notebook, a file naming convention, and a great deal of discipline, more realistically, most investigators employ a minimum method, which is a combination of procedures, tools, and policies that are followed for the capture of data in each case.
What’s your minimum method?
What sources are you using?
What is the baseline?
What was the intent of the source?
What are the limitations?
What is your procedure to capture your data?
What tools are you using?
Can that procedure be repeated exactly and yield the same results?
If it doesn’t - how are you going to demonstrate and store the results of your capture?
What is the scope?
How are you protecting sensitive data and preserving privacy?
We’re passionate about helping analysts build repeatable flows for their OSINT analysis. Our platform, ORIGIN, helps you capture, retain, redact, and share information from a huge variety of sources, ensuring that you and your team stick to your minimum method. We’d love to hear from you
Framework Will Help You Grow Your Business With Little Effort.