War Diaries Talk

How do we produce Operation War Diary's Data? Find out more in our new blog post!

  • ral104 by ral104 moderator, scientist

    Have you ever wondered how the tags you're adding to the diaries are actually used by our system? Wonder no more! Our new blog post describes the algorithm we use to produce our data: http://blog.operationwardiary.org/2014/06/16/how-does-operation-war-diary-generate-its-data/

    Posted

  • paratsoukli by paratsoukli

    A fascinating exposition of the analysis technique, but I bet I'm not alone in wishing that this had been published at the start of the exercise, not several months in: the criteria revealed here would surely have affected some of my tagging. In certain documents - a densely packed casualty list or a diary page by an adj desperate to get a whole month in one page - trying to tag at the 'centre' of the calculated 'field' (in itself, a guess)would result in multiple tag overlaps; can the software unravel such a dense forest of tags?

    Posted

  • ral104 by ral104 moderator, scientist

    Glad you found it interesting! I agree with you that we should have made all this clear much earlier on - we realise communication and feedback have been sorely neglected over the last few months and we're trying really hard to rectify that now.

    I've only been involved in the project for a month now, and so I'm I may have overlooked things you'd like to know about. Please feel free to ask if there's anything you think we should be covering, and I'll do my best to get answers for you.

    In terms of whether the system can deal with dense fields of tags, the short answer is: yes! Rough analysis shows that the tags are ~95% accurate, which for something of this magnitude is very impressive. The information associated with each tag is what allows us to identify the entities that each one relates to. So in crowded lists of names, we can pick out the five tags in a greater cluster which relate to an individual with a particular name or variants thereof. After that, there may be some physical overlap of tags, depending on where they were originally placed, but the important thing is that the associated data has been preserved and we can now use it to index that page.

    Hope that helps!

    Posted