War Diaries Talk

Tagging data for use by academics

  • rsgrayson by rsgrayson scientist

    I am writing an article on the day-to-day activities of the British Expeditionary Force drawing entirely on data form Operation War Diary. The work which has been done is invaluable and huge thanks are due to all volunteers.

    I am focusing on data for the first six British divisions to arrive in France, along with some information on 1 and 2 cavalry divisions. I have noticed a few patterns which I thought I should flag so that there can be further improvements in accuracy on tagging.

    The first is the need to make sure that some activity is tagged for every day. For example, looking at data on all the infantry battalions tagged so far, there are 61,979 different dates tagged. However, there is only a tag for some kind of activity on 46,331 of those days. So we have data for about 75% of the days but are missing it on around 25% of the days. Second, I also wanted to flag the importance of using as many tags as possible on each day - for example, if you look for the 'in the line' tag, you will often find it missing even when a battalion is under fire, and in exactly the same place as it was either side of the day in question. So even if the main activity of a battalion seems to be under fire or attacking, it is still worth tagging them as being in the line. I know that OWD presents users with a complex series of tags and that there is a lot to do, but I would be very grateful for extra attention being paid to troop activity.

    Thank you for all of your hard work so far. The data that you have generated are great, and by fine tuning things a bit at this stage I can focus even more precisely on my research question. Please be in touch with any questions or comments—I’d love to hear from you.


    Prof. Richard Grayson
    Goldsmiths, University of London
    r.grayson@gold.ac.uk

    Posted

  • marie.eklidvirginmedia.com by marie.eklidvirginmedia.com

    I am tagging the 6th Division Royal Field Artillery. A question on 'in the line'. I would presume that if they are at gun post positions they would not be in the trenches. I assume the trenches are the Front Line. If they are at gun post positions should they be tagged 'in the line'. I have read your article with interest. Your work sounds very interesting concerning the BEF.

    My Grandfather went out with the BEF to France on 21 May 1915 (Kitchen's New Army) Joined the 41st Brigade 14th Light Division - Rifle Brigade. - 2nd Battle of Ypres: 30 & 31st July 1915- Hooge. (German Liquid Attack). 25th Sept 1915 - Second attack on Bellewaarde. Served until 29th April 1916. Wounded and returned home. He was also a Veteran of the Boar War. I have been told there are no diaries for the Rifle Brigade, which is unfortunate - would have been very interesting to tag it.

    Posted

  • rsgrayson by rsgrayson scientist

    Thanks Marie. I'd suggest that 'in the line' for artillery should equate to gun post positions as that is their 'front' line. I think what being 'in the line' essentially means is about being at action stations, and does not have to mean trenches in particular. Interesting to hear of your family. My great-uncle was killed at Hooge on 25-9-1915 in the 2nd Royal Irish Rifles.

    Posted

  • marie.eklidvirginmedia.com by marie.eklidvirginmedia.com

    Thank you for your quick reply to my question of 'in the line'. This has answered my query regarding their 'front line' and makes it clearer. Sorry to hear your news of your Great Uncle. I have some photos of the mine craters in Hooge Chateau grounds. Such a desolate place. Good luck with your research.

    Posted

  • ral104 by ral104 moderator, scientist

    Thanks for the update, @rsgrayson - really looking forward to reading the article. It's great to see OWD data feeding into active research like this and I hope your work will be the catalyst for more to come.

    I'll feature this post, to make sure people see it.

    Posted

  • rsgrayson by rsgrayson scientist

    Thanks ral104 and Marie again!

    Posted

  • bje by bje in response to rsgrayson's comment.

    Great to hear something about how the data is being used.

    So far I've been careful only to tag what I can actually see in each day's diary, and not to make any assumptions. Would it be more useful to add tags where I can infer from previous entries that an activity is taking place?

    Posted

  • ral104 by ral104 moderator, scientist

    I think the answer is probably yes, but only when the inference is very clear. Thanks! 😃

    Posted

  • rsgrayson by rsgrayson scientist

    Sorry for the delay bje - yes, I think ral104 offers the right advice here.

    Posted

  • bje by bje

    Thanks for replies - will do

    Posted

  • deehar by deehar in response to ral104's comment.

    I seem to remember some discussion about this in the past when it was clearly pointed out that we were not intelligent enough to make assumptions and we should stick to tagging what was there and not what we thoght should be there. Interpretation should be left to the historians who earn their keep by filling in the blanks.

    Posted

  • ral104 by ral104 moderator, scientist

    A couple of points in response to that:

    1. Nobody is here to denigrate anybody else's intelligence.

    2. Placing additional tags in this context is more in the nature of interpolating from the data, rather than extrapolating from it. Where it's clear activity extends across days, an additional tag or two gives us a fuller data set. That was something we hadn't necessarily thought of in the early days, but with the project having run for over 18 months now, our understanding of things has evolved somewhat.

    @rsgrayson's work will be a great showcase for the data we are producing and I hope other academics will realise what a fantastic resource is being developed. However, the data is in the public domain and I'd encourage anybody with any interest in the First World War to use it to explore their own research questions.

    Anybody who wants to get hold of the raw data can find it here: http://wd3.herokuapp.com/public

    Posted

  • deehar by deehar in response to ral104's comment.

    I didn't mean actual intelligence. I was referring to the amount of interpretation that we were
    expected to do. If we are to do more (interpretive) tagging does that mean:

    1. Tagging "Place" and "Unit Activity" where it is obvious but the author has left it blank or made a comment "same as yesterday" or "nothing of interest happened".
    2. Tagging items like Orders received, Officers meetings, situation descriptions, proposals for improvements etc as Army Life - Other
    3. Tagging so-called "routine" activities like evacuating casualties from Field Ambulance, horses from Veterinary Units

    This would certainly reduce the percentage of days without a tag but I assume you are only talking about item 1 in your post.

    Posted

  • ral104 by ral104 moderator, scientist

    Thanks, @deehar. Yes, point 1 is essentially what I meant. Point 2 I think is slightly more subjective. There are instances where I tag things like this and instances where I don't. It's difficult to be prescriptive there; it's more of a judgement call for you as a tagger. As a rule of thumb, I'd say if in doubt, tag!

    Point 3, tagging routine activities is something we've always advised against, but as I say, our understanding of the data is evolving, so it's always good to re-examine our approach. My own feeling is that we should still probably not tag these, but if others feel differently, please let us know!

    Posted

  • HeatherC by HeatherC moderator

    I'd agree with Rob about not tagging routine activities or it makes an awful lot of work for the tagger, which I'm not sure is repaid by better data? Totally agree on your point 1 though.

    Posted

  • cyngast by cyngast moderator

    Is it stretching the term to tag an advanced dressing station for a field ambulance unit as being "in the line"? This one, from the 141st (Secunderabad) Field Ambulance, is in Authuille which is just west of Thiepval in Sept. 1915, when Thiepval was held by the Germans.

    Of course, the tricky part is that the diary for a field ambulance is usually written by the C.O. who is usually at the location of the field ambulance itself rather than the advanced dressing station. In other words, only part of the unit is at the advanced dressing station while the remainder are back a few miles.

    Another question: If a senior officer "visits" or "goes round" the trenches, or in this case the advanced dressing station, could that be tagged as an inspection?

    Posted

  • ral104 by ral104 moderator, scientist

    I think you could legitimately use 'in the line' here, following the same logic as applied to artillery units earlier. For a Field Ambulance, this is their front line, and although the dressing station and other parts of the unit are sited in different locations relative to the battle line, it's not quite the same as the dressing station being detached, because this is exactly how they were supposed to operate.

    The inspection tag is slightly less clear cut. In most cases, I'd imagine you would be fine to use it, but you do also get instances of staff officers (and others) visiting for instructional purposes, so it's probably a judgement call based on info on that particular diary page.

    Posted

  • thisguyiknow by thisguyiknow

    Just a general response to the matter of untagged dates—I am working with a cavalry unit's diary in which (prior to their going into the line) several days' entries read merely "Nothing to report," or the like.

    During that same period, the author makes entries the dates of which are in the format "[say, January] 8 - 14." The entry might then say "Training." This sort of thing may explain at least some of the dates lacking tags. Or ought one to date-tag such an entry 7 times?

    Posted

  • ral104 by ral104 moderator, scientist

    Good point. In that instance it's best to tag the range - so place one date tag with the earlier date and then one with the latter, so the information is bounded by both.

    Posted

  • thisguyiknow by thisguyiknow

    This is an alert and apology. Until just now, the google maps insert in the "Place" dialogue showed no map features on my machine—just a blank field. When entering a place name returned more than one suggestion, I took it to mean there was a simple duplication in the database. But suddenly, the map works! Hooray, but Oooh, Noooo! Now that I can see the terrain and names of nearby towns, I was able to tell that it is the second suggestion for "Jonquières" which is the correct town. There have been several such "duplicated" place names in the diary on which I have been working. I will find my profile later today and check the suspect entries, if my browser is still showing the map data. [Incidentally, I discovered this while tagging the last word of the last entry in the diary: timing, it seems, is everything!]

    Posted

  • thisguyiknow by thisguyiknow in response to ral104's comment.

    OK, thanks for the suggestion—I'll go back over my profile and insert the 'end of range' dates. Also, please see my message of chagrin immediately above: I'll take care of both problems as I go through.

    Posted

  • thisguyiknow by thisguyiknow

    Well, I thought I knew how to go back to re-tag pages, but that seems not to be the case. Found images of 10 recent pages under my profile, but am not able to do work on them. Is there a way to revise them? Sorry, I seem to have made a mess of things.

    Posted

  • marie.eklidvirginmedia.com by marie.eklidvirginmedia.com

    Will this help you - or do you know this already? Ral the Moderator previously told me how to edit my profile pages and I posted it into The Mess Hall section under Useful Tips.

    (1) Go to your Profile page - brings up your recent pages. Should show the last 10 pages normally. Select the page you want . Go to the comment box part- put the cursor to the RIGHT HAND SIDE OF YOUR NAME and it will bring up Edit/Remove. Press Edit - you can then put in what you need to say or change. - (will only allow so many words) - when finished press update.

    Or (2) Go to - Currently On Line page. Go into your name. This will bring up ALL the pages you have commented on. Find the page you want. When the page comes up go to comment box part and use the same procedure as above.

    Posted

  • cyngast by cyngast moderator

    I don't believe there is any way to go back and correct actual tags from the menu on the left side of the page on the diary page itself. The best you can do is to go back to the pages you want to correct and put a correction in the comment box. However, beyond the last 10 pages you have tagged, you can find only those pages where you have previously made a comment in the comment box.

    But don't worry about any errors in tagging. The way this system is set up, five people tag each diary page. Then someone else higher up in the project reviews those and settles on the best and/or most frequent results of the tags. So if you have mis-tagged a place, and the other four taggers got it right, the end information available to researchers will have the correct data.

    I know that I have made mistakes. Just today, I failed to change the day in a date I tagged. I have also clicked on the Finish button instead of the Comments box icon more times than I can keep track of when I'm only halfway through a page. Did that just yesterday.

    Posted

  • thisguyiknow by thisguyiknow in response to marie.eklid@virginmedia.com's comment.

    Marie - That's outstanding! Thanks for both methods. While I won't be able to check the map insets, this will make it possible to indicate the range of dates covered by multiple-day entries. It's clear I need to sample the fare at the Mess Hall & get myself in the picture. Many thanks for taking the trouble to repeat here what you have already posted there!

    Posted

  • thisguyiknow by thisguyiknow in response to cgastwein@aol.com's comment.

    Thanks for posting this kind note - it's very helpful to know in advance the limits of what the system will permit. I appreciate, too, the reassurance that there is fairly deep redundancy built in: one can start to feel 'alone in the foxhole,' and that only one's accuracy stands between Truth and Chaos... Also, thanks for the reminder that even a grizzled veteran accepts that he or she makes mistakes, and even has 'favorite' errors. Which of my broad repertoire of gaffes will survive to become ruefully accepted mascots? Time will tell! Thank you very much.

    Posted

  • thisguyiknow by thisguyiknow in response to marie.eklid@virginmedia.com's comment.

    Um, sorry, mixed-metaphor alert!

    Posted

  • ral104 by ral104 moderator, scientist

    Marie and Cynthia are absolutely right - we all make mistakes (including us not making the five-taggers system clear enough in the tutorial!). Don't worry about it at all. We're very glad to have you here and appreciate the time you're putting in.

    Anything we can do to help, just ask.

    Posted

  • 141Dial34 by 141Dial34

    Ral,if 5 people are needed to complete a diary why doesn't the % completed show 20% as I have finished the diary?

    Posted

  • ral104 by ral104 moderator, scientist in response to 141Dial34's comment.

    That's because the five taggers rule applies to each page. So, even if you've completed the diary it would still show as 0% if you're the only person to have done it. It will show as 20% once 20% of the pages are completed by 5 people. 😄

    Posted

  • deehar by deehar in response to ral104's comment.

    Aha! So even if all pages have been done by 4 people it will still register 0% even though 80% of the work had been done !
    The fifth person to classify that diary would see a nice rise in percentage after every page he completed.

    Posted

  • ral104 by ral104 moderator, scientist

    Indeed! Although, generally we seem to get five people working together on most diaries, so the progress metre ticks along nicely. The exceptions tend to be things like the ammunition columns and sanitary sections. If you've done one of them, it seems to take forever to get the counter to turn over.

    Posted

  • Misawa by Misawa in response to marie.eklidvirginmedia.com's comment.

    You might try the regimental museum of the Rifle Brigade. They would likely have some Great War information other than that in the PRO.

    Posted