Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-16 and UTF-8 encoding error in XmlImporter #9

Closed
numito opened this issue Aug 30, 2018 · 8 comments
Closed

UTF-16 and UTF-8 encoding error in XmlImporter #9

numito opened this issue Aug 30, 2018 · 8 comments

Comments

@numito
Copy link

numito commented Aug 30, 2018

XmlImporter will trigger an error while reading xml depending on the version of Alma.
Alma might return an XML document in the wrong encoding, encoding in XML file is said to be UTF-16 but actually it is UTF-8.
A solution is to catch this error, then change the encoding in the XML using a preg_replace from UTF-16 to UTF-8 and then parse the XML file.

See attached XMLImporter file
XmlImporter.php.zip

@numito
Copy link
Author

numito commented Aug 30, 2018

Oops sorry this should be in the Marc project as it is a Marc issue, I am going to post a new Issue on Marc project.

@danmichaelo
Copy link
Member

danmichaelo commented Sep 2, 2018

Just realized that I haven't made a release in a while, so the version on Packagist was quite outdated. I pushed version 0.7.0 now, which contains a lots of changes, most of which should be documented in the changelog.

Among other things, I have switched from using JSON to XML when it comes to the Bib API. One reason is the annoying UTF-16 thing, which is only present when using JSON. More importantly, though, updating a record only works when using XML.


Update: Wait, it seems like I also used XML in version 0.6.1. Anyways, please try the latest version and see if the issue is still there. If it is, can you check if you get UTF-16 if you request the same record using curl?

$ curl -X GET --header "Accept: application/xml; charset=utf-8" --header "Authorization: apikey ${ALMA_KEY}" "https://api-eu.hosted.exlibrisgroup.com/almaws/v1/bibs/${MMS_ID}"

In my case, I get UTF-8 as long as I use Accept: application/xml, but UTF-16 with Accept: application/json

@krifro
Copy link

krifro commented Nov 8, 2018

Still a problem in 7.1. Note, the problem is not that Alma sends out UTF-16 - it never does. However, sometimes it claims that the UTF-8 it sends out is really UTF-16.

@danmichaelo
Copy link
Member

@krifro , I'm aware that Alma sends the wrong encoding when requesting json, but I have never observed it when requesting xml, which is what php-alma-client has done since version 0.7.0. Can you please provide a minimal example that reproduces the problem? Or is it not reproducable?

@krifro
Copy link

krifro commented Nov 9, 2018

@danmichaelo: I'm not used to github, so I'm sorry if I shouldn't be doing this inline or something.

composer.json:
{ "require": { "php-http/guzzle6-adapter": "^1.1", "symfony/options-resolver": "3.*", "scriptotek/alma-client": "^0.7.1" } }

My project requires php 5.*, so I'm using 5.6.37 and needed to use options-resolver 3 rather than 4.

test.php:

require_once('vendor/autoload.php');
use Scriptotek\Alma\Client as AlmaClient;

$alma = new AlmaClient(API_KEY, 'eu');
$bib = $alma->bibs->get(MMS_ID);
$holding = $bib->holdings[HOLDING_ID];
echo "Gets here, no worries.\n";
$markRecord = $holding->record;
echo "But not here.\n";

I've tried this against two alma instances, using two different computers and three different operating systems, with different languages, but I keep getting:

Gets here, no worries.

Fatal error: Uncaught exception 'Scriptotek\Marc\Exceptions\XmlException' with message 'Failed loading XML: \nDocument labelled UTF-16 but has UTF-8 content
' in /private/tmp/alma/vendor/scriptotek/marc/src/Importers/XmlImporter.php:29
Stack trace:
#0 /private/tmp/alma/vendor/scriptotek/marc/src/Importers/Importer.php(27): Scriptotek\Marc\Importers\XmlImporter->__construct('<?xml version="...')
#1 /private/tmp/alma/vendor/scriptotek/marc/src/Collection.php(48): Scriptotek\Marc\Importers\Importer->fromString('<?xml version="...')
#2 /private/tmp/alma/vendor/scriptotek/marc/src/Record.php(68): Scriptotek\Marc\Collection::fromString('<?xml version="...')
#3 /private/tmp/alma/vendor/scriptotek/alma-client/src/Bibs/Holding.php(53): Scriptotek\Marc\Record::fromString('<?xml version="...')
#4 /private/tmp/alma/vendor/scriptotek/alma-client/src/Model/LazyResource.php(83): Scriptotek\Alma\Bibs\Holding->onData(Object(stdClass))
#5 /private/tmp/alma/vendor/scriptotek/alma-client/src/Bibs/Holding.php(71): Scriptotek\Alma\Model\LazyRe in /private/tmp/alma/vendor/scriptotek/marc/src/Importers/XmlImporter.php on line 29

I can iterate over the holding's items and from them fetch stuff like the holding ID, but fiddling with the holding itself breaks.

@danmichaelo
Copy link
Member

Aaah, holding records! I rarely work with those myself, so I probably forgot to test them. Will fix!

@danmichaelo
Copy link
Member

@krifro , give 0.7.2 a try 🚀

@krifro
Copy link

krifro commented Nov 9, 2018

I'll do so come Monday, I don't have access from home. Thank you for looking into this so fast! :)

Edit: Tested, works! The framework can now fetch and parse a Holding record. Unfortunately, we still won't be able to use the framework, it seems:

Note: Editing holding records is not yet supported. Will be added in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants