Recover Arraybuffer From Xhr.responsetext
Solution 1:
I have got same issue too.
The solution (I ran at Chrome(68.0.3440.84))
let url = ''let iso_8859_15_table = { 338: 188, 339: 189, 352: 166, 353: 168, 376: 190, 381: 180, 382: 184, 8364: 164 }
functioniso_8859_15_to_uint8array(iso_8859_15_str) {
let buf = newArrayBuffer(iso_8859_15_str.length);
let bufView = newUint8Array(buf);
for (let i = 0, strLen = iso_8859_15_str.length; i < strLen; i++) {
let octet = iso_8859_15_str.charCodeAt(i);
if (iso_8859_15_table.hasOwnProperty(octet))
octet = iso_8859_15_table[octet]
bufView[i] = octet;
if(octet < 0 || 255 < octet)
console.error(`invalid data error`)
}
return bufView
}
req = newXMLHttpRequest();
req.overrideMimeType('text/plain; charset=ISO-8859-15');
req.onload = () => {
console.log(`Uint8Array : `)
var uint8array = iso_8859_15_to_uint8array(req.responseText)
console.log(uint8array)
}
req.open("get", url);
req.send();
Below is explanation what I learned to solve it.
Explanation
Why some parts are way off?
because TextDecoder cause data loss (Your case is utf-8).
For example, let's talk about UTF-8
variable width character encoding for Unicode.
It has
rules
(This will become problem.) for reasons such as variable length characteristics and ASCII compatibility, etc.so, decoder may replace a non-conforming characters to replacement character such as U+003F(?, Question mark) or U+FFFD(�, Unicode replacement character).
in utf-8 case, 0~127 of values are stable, 128~255 of values are unstable. 128~255 will converted to U+FFFD
Are other Text Decoders safe except UTF-8?
No. In most cases, not safe from rules
.
UTF-8 is also unrecoverable. (128~255 are set to U+FFFD)
If the binary data and the decoded result can be corresponded to one-to-one, they can be recovered.
How to solve it?
- Finds recoverable Text Decoders.
- Force MIME type to recoverable charset of the incoming data.
xhr_object.overrideMimeType('text/plain; charset=ISO-8859-15')
- Recover binary data from string with
recover table
when received.
Finds recoverable Text Decoders.
To recover, avoid the situation when decoded results' are duplicated.
The following code is a simple example, so there may be missing recoverable text decoders because it only consider Uint8Array.
let bufferView = newUint8Array(256);
for (let i = 0; i < 256; i++)
bufferView[i] = i;
let recoverable = []
let decoding = ['utf-8', 'ibm866', 'iso-8859-2', 'iso-8859-3', 'iso-8859-4', 'iso-8859-5', 'iso-8859-6', 'iso-8859-7', 'iso-8859-8', 'iso-8859-8i', 'iso-8859-10', 'iso-8859-13', 'iso-8859-14', 'iso-8859-15', 'iso-8859-16', 'koi8-r', 'koi8-u', 'macintosh', 'windows-874', 'windows-1250', 'windows-1251', 'windows-1252', 'windows-1253', 'windows-1254', 'windows-1255', 'windows-1256', 'windows-1257', 'windows-1258', 'x-mac-cyrillic', 'gbk', 'gb18030', 'hz-gb-2312', 'big5', 'euc-jp', 'iso-2022-jp', 'shift-jis', 'euc-kr', 'iso-2022-kr', 'utf-16be', 'utf-16le', 'x-user-defined', 'ISO-2022-CN', 'ISO-2022-CN-ext']
for (let dec of decoding) {
try {
let decodedText = newTextDecoder(dec).decode(bufferView);
let loss = 0let recoverTable = {}
let unrecoverable = 0for (let i = 0; i < decodedText.length; i++) {
let charCode = decodedText.charCodeAt(i)
if (charCode != i)
loss++
if (!recoverTable[charCode])
recoverTable[charCode] = i
else
unrecoverable++
}
let tableCnt = 0for (let props in recoverTable) {
tableCnt++
}
if (tableCnt == 256 && unrecoverable == 0){
recoverable.push(dec)
setTimeout(()=>{
console.log(`[${dec}] : err(${loss}/${decodedText.length}, ${Math.round(loss / decodedText.length * 100)}%) alive(${tableCnt}) unrecoverable(${unrecoverable})`)
},10)
}
else {
console.log(`!! [${dec}] : err(${loss}/${decodedText.length}, ${Math.round(loss / decodedText.length * 100)}%) alive(${tableCnt}) unrecoverable(${unrecoverable})`)
}
} catch (e) {
console.log(`!! [${dec}] : not supported.`)
}
}
setTimeout(()=>{
console.log(`recoverable Charset : ${recoverable}`)
}, 10)
In my console, this return
recoverable Charset : ibm866,iso-8859-2,iso-8859-4,iso-8859-5,iso-8859-10,iso-8859-13,iso-8859-14,iso-8859-15,iso-8859-16,koi8-r,koi8-u,macintosh,windows-1250,windows-1251,windows-1252,windows-1254,windows-1256,windows-1258,x-mac-cyrillic,x-user-defined
And I used iso-8859-15
at beginning of this answer. (It has Smallest table size.)
Additional test) Comparison between UTF-8's and ISO-8859-15's result
Check U+FFFD is really disappeared when using ISO-8859-15.
functionrequestAjax(url, charset) {
let req = newXMLHttpRequest();
if (charset)
req.overrideMimeType(`text/plain; charset=${charset}`);
else
charset = 'utf-8';
req.open('get', url);
req.onload = () => {
console.log(`==========\n${charset}`)
console.log(`${req.responseText.split('', 50)}\n==========`);
console.log('\n')
}
req.send();
}
var url = '';
requestAjax(url, 'ISO-8859-15');
requestAjax(url);
Bottom line
- Recover binary data to, from string needs some additional job.
- Find recoverable text encoder/decoder.
- Make a recover table
- Recover with the table.
- (You can refer to the very top of code.)
- For use this trick, force MIME type of incoming data to desired charset.
Post a Comment for "Recover Arraybuffer From Xhr.responsetext"