Recover Arraybuffer From Xhr.responsetext

January 31, 2024 Post a Comment

I need to get an array buffer from an http request sending me a base64 answer. For this request, I can't use XMLHttpRequest.responseType='arraybuffer'. The response I get from this

Solution 1:

I have got same issue too.

The solution (I ran at Chrome(68.0.3440.84))

let url = 'data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=='let iso_8859_15_table = { 338: 188, 339: 189, 352: 166, 353: 168, 376: 190, 381: 180, 382: 184, 8364: 164 }

functioniso_8859_15_to_uint8array(iso_8859_15_str) {
    let buf = newArrayBuffer(iso_8859_15_str.length);
    let bufView = newUint8Array(buf);
    for (let i = 0, strLen = iso_8859_15_str.length; i < strLen; i++) {
        let octet = iso_8859_15_str.charCodeAt(i);
        if (iso_8859_15_table.hasOwnProperty(octet))
            octet = iso_8859_15_table[octet]
        bufView[i] = octet;
        if(octet < 0 || 255 < octet)
            console.error(`invalid data error`)
    }
    return bufView
}

req = newXMLHttpRequest();
req.overrideMimeType('text/plain; charset=ISO-8859-15');
req.onload = () => {
    console.log(`Uint8Array : `)
    var uint8array = iso_8859_15_to_uint8array(req.responseText)
    console.log(uint8array)
}
req.open("get", url);
req.send();

Below is explanation what I learned to solve it.

Explanation

Why some parts are way off?

because TextDecoder cause data loss (Your case is utf-8).

For example, let's talk about UTF-8

variable width character encoding for Unicode.
It has rules(This will become problem.) for reasons such as variable length characteristics and ASCII compatibility, etc.
so, decoder may replace a non-conforming characters to replacement character such as U+003F(?, Question mark) or U+FFFD(�, Unicode replacement character).
in utf-8 case, 0~127 of values are stable, 128~255 of values are unstable. 128~255 will converted to U+FFFD

Are other Text Decoders safe except UTF-8?

No. In most cases, not safe from rules.

UTF-8 is also unrecoverable. (128~255 are set to U+FFFD)

If the binary data and the decoded result can be corresponded to one-to-one, they can be recovered.

How to solve it?

Finds recoverable Text Decoders.
Force MIME type to recoverable charset of the incoming data. xhr_object.overrideMimeType('text/plain; charset=ISO-8859-15')
Recover binary data from string with recover table when received.

Finds recoverable Text Decoders.

To recover, avoid the situation when decoded results' are duplicated.

The following code is a simple example, so there may be missing recoverable text decoders because it only consider Uint8Array.

let bufferView = newUint8Array(256);
for (let i = 0; i < 256; i++)
    bufferView[i] = i;

let recoverable = []
let decoding = ['utf-8', 'ibm866', 'iso-8859-2', 'iso-8859-3', 'iso-8859-4', 'iso-8859-5', 'iso-8859-6', 'iso-8859-7', 'iso-8859-8', 'iso-8859-8i', 'iso-8859-10', 'iso-8859-13', 'iso-8859-14', 'iso-8859-15', 'iso-8859-16', 'koi8-r', 'koi8-u', 'macintosh', 'windows-874', 'windows-1250', 'windows-1251', 'windows-1252', 'windows-1253', 'windows-1254', 'windows-1255', 'windows-1256', 'windows-1257', 'windows-1258', 'x-mac-cyrillic', 'gbk', 'gb18030', 'hz-gb-2312', 'big5', 'euc-jp', 'iso-2022-jp', 'shift-jis', 'euc-kr', 'iso-2022-kr', 'utf-16be', 'utf-16le', 'x-user-defined', 'ISO-2022-CN', 'ISO-2022-CN-ext']
for (let dec of decoding) {
    try {
        let decodedText = newTextDecoder(dec).decode(bufferView);
        let loss = 0let recoverTable = {}
        let unrecoverable = 0for (let i = 0; i < decodedText.length; i++) {
            let charCode = decodedText.charCodeAt(i)
            if (charCode != i)
                loss++

            if (!recoverTable[charCode])
                recoverTable[charCode] = i
            else
                unrecoverable++
        }
        let tableCnt = 0for (let props in recoverTable) {
            tableCnt++
        }
        if (tableCnt == 256 && unrecoverable == 0){
            recoverable.push(dec)
            setTimeout(()=>{
                console.log(`[${dec}] : err(${loss}/${decodedText.length}, ${Math.round(loss / decodedText.length * 100)}%) alive(${tableCnt}) unrecoverable(${unrecoverable})`)
            },10)
        }
        else {
            console.log(`!! [${dec}] : err(${loss}/${decodedText.length}, ${Math.round(loss / decodedText.length * 100)}%) alive(${tableCnt}) unrecoverable(${unrecoverable})`)
        }
    } catch (e) {
        console.log(`!! [${dec}] : not supported.`)
    }
}

setTimeout(()=>{
    console.log(`recoverable Charset : ${recoverable}`)
}, 10)

In my console, this return

recoverable Charset : ibm866,iso-8859-2,iso-8859-4,iso-8859-5,iso-8859-10,iso-8859-13,iso-8859-14,iso-8859-15,iso-8859-16,koi8-r,koi8-u,macintosh,windows-1250,windows-1251,windows-1252,windows-1254,windows-1256,windows-1258,x-mac-cyrillic,x-user-defined

And I used iso-8859-15 at beginning of this answer. (It has Smallest table size.)

Additional test) Comparison between UTF-8's and ISO-8859-15's result

Check U+FFFD is really disappeared when using ISO-8859-15.

functionrequestAjax(url, charset) {
    let req = newXMLHttpRequest();
    if (charset)
        req.overrideMimeType(`text/plain; charset=${charset}`);
    else
        charset = 'utf-8';
    req.open('get', url);
    req.onload = () => {
        console.log(`==========\n${charset}`)
        console.log(`${req.responseText.split('', 50)}\n==========`);
        console.log('\n')
    }
    req.send();
}

var url = 'data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==';
requestAjax(url, 'ISO-8859-15');
requestAjax(url);

Bottom line

Recover binary data to, from string needs some additional job.
- Find recoverable text encoder/decoder.
- Make a recover table
- Recover with the table.
- (You can refer to the very top of code.)
For use this trick, force MIME type of incoming data to desired charset.

JavaScript Creator