From time to time (once every month or two, but twice in the last two weeks) we see our ODK form publisher go into the PAUSED state.
It’s meant to retry automatically (every 15-30min I think) and recover, but we are finding that if a POST to OpenFn failed once it doesn’t ever recover and we have to manually delete and re-create the publisher to recover.
It’s always a pain to figure out how many records have been missed and re-publish them only (we sometimes have >1k submissions and don’t just want to throw the whole lot at OpenFn as that’s messy, even though it could handle it fine.
I’m guessing the problem lies in the ODK code, and I’ll still post on their mailing list (which is currently in transition for 2-3 days), but my question to the OpenFn team is this:
Do you often see ODK JSON publishers stalled on other projects that you are running/supporting?
The JSON publisher is still in Alpha, which makes we wonder if there may still be some bugs that need addressing. Although the retry mechanism applies to all the publishers (including those not in Alpha)
errordetails1.txt (7.61 KB)
errordetails2.txt (7.5 KB)
We’ve actually narrowed down the problem to:
a) Specific submissions containing the “é” character in some fields
b) Publishers pointing at OpenFn. If we point at our own endpoint then the publisher runs through fine (See attachment)
Having a look at our own endpoint shows that the “é” character is replaced with the replacement character (Hex: EF BF BD, Unicode: U+FFFD)
We’re guessing this happens in ODK’s publisher but are not sure - always difficult to trace exactly where encoding changes/problems occur.
So the revised question is: Would OpenFn’s inbox endpoint reject POSTs containing some form/encoding of the replacement character?
Just copying and pasting the character (�) and posting to OpenFn seems to work fine.
This is the text we think is causing our problem:
50 65 74 72 EF BF BD 20 4C 65 20 52 6F 75 78
Petr� Le Roux
Last comment, it may well be related to something like this on the ODK side, but we’re still wondering if/why OpenFn is discarding (not responding to) the POSTs.
I can’t get past what you’ve already confirmed, that Open Function responds with a 200 and an empty JSON object to all three examples you’ve given. (See attached.) I’m off to the ODK GitHub repo to see what the exact mechanism is for posting data.
Update here: https://github.com/opendatakit/opendatakit/issues/1283
Software and hardware versions
Aggregate v1.4.7 Production
The ODK Aggregate JSON Publisher appears to use Latin-1 encoding, despite seeming to demand UTF-8. This causes failures at least when sending survey data containing the special character
65 72 5f 6e 61 6d 65 22 e r _ n a m e "
3a 22 54 e9 6c 6f 72 22 : " T . l o r "
7d 5d 7d } ] }
String Entity toString doesn’t have charset
I’ll submit a PR right now to show the proposed change, but I can’t run this on my local machine yet.
Steps to reproduce the problem1. Spin up Aggregate on Google AppSpot
- Set up a form with an active JSON Alpha Publisher that sends data to a server which will give you access to the binary. I use https://ngrok.com/ so inspect the requests.
- Submit data via ODK Collect which includes a special character. I’ve used “Télor” in this example.
- Inspect the POST coming from Aggregate and note that the
é is encoded as
e9 (binary in hex), rather than
c3 a9. (Inspector image below.)
Note that while it’s less obvious, you can tell the difference using Requestb.in. In the image below you’ll see that the second post (from ODK Aggregate) is Latin-1 encoded—we know this because it only contains 435 bytes, while the first post, made by me using UTF-8 encoding, is 436 bytes.
Posts to JSON servers should use UTF-8 encoding.
45 6e 74 65 72 5f 6e 61 E n t e r _ n a
6d 65 22 3a 22 54 c3 a9 m e " : " T . .
6c 6f 72 22 7d 5d 7d l o r " } ] }
I’m wondering if this is an issue with ODK Aggregate, or with Tomcat. It seems that you can set the request headers, but I’m not a Java developer and haven’t spun up ODK Aggregate on my machine.
These are what seem to be the relevant bits:
- Defining UTF8_ENCODE in HtmlConsts
- Create submissionJsonObj
- Send the post using HttpEntity
- String Entity toString doesn’t have charset—this might be the culprit
@acawood777 thanks for bringing this to my attention
@rorymckinley thanks for digging in with me
Submitted a PR to ODK Aggregate, and also looking into how we could (without getting really dirty) convert Latin-1 encoding to UTF-8 in our Phoenix plug.
Fantastic work, thanks for tracking that down! Please keep us posted on
whether you're able to come up with a workaround on your end.
We'd even be happy with a replacement character being populated instead of
the é initially, as long as OpenFn responds with a 200 so the ODK publisher
doesn't get stuck on that submission.
We've updated our ODK form so that there is no longer an é-character in one
of the dropdown lists, but it's only a matter of time before someone uses a
character like that in a text field somewhere!