Foreign Characters in android & Java

I was trying to download and parse a webpage with foreign (Chinese) characters. I'm not sure whether I should use "utf-8" or something else. But none of these seems to work for me. I used the sample Wikitionary code for getUrlContent().

public void onCreate(Bundle savedInstanceState) {
    mText = (TextView) findViewById(;
    String test = new String("fail");

    try {
        test = getUrlContent("");
    } catch (ApiException e) {
        // TODO Auto-generated catch block
    byte[] b = new byte[100000];

    try {
          b = test.getBytes("utf-8");
    } catch (UnsupportedEncodingException e) {
        // TODO Auto-generated catch block

    char[] charArr = (new String(b)).toCharArray();
    CharSequence seq = java.nio.CharBuffer.wrap(charArr); 

    mText.setText(charArr, 0, 1000);//.setText(seq);

protected static synchronized String getUrlContent(String url) throws ApiException {
    if (sUserAgent == null) {
        throw new ApiException("User-Agent string must be prepared");

    // Create client and set our specific user-agent string
    HttpClient client = new DefaultHttpClient();
    HttpGet request = new HttpGet(url);
    request.setHeader("User-Agent", sUserAgent);

    try {
        HttpResponse response = client.execute(request);

        // Check if server response is valid
        StatusLine status = response.getStatusLine();
        if (status.getStatusCode() != HTTP_STATUS_OK) {
            throw new ApiException("Invalid response from server: " +

        // Pull content stream from response
        HttpEntity entity = response.getEntity();
        InputStream inputStream = entity.getContent();

        ByteArrayOutputStream content = new ByteArrayOutputStream();

        // Read response into a buffered stream
        int readBytes = 0;
        while ((readBytes = != -1) {
            content.write(sBuffer, 0, readBytes);

        // Return result from buffered stream
        return new String(content.toByteArray(), "utf-8");
    } catch (IOException e) {
        throw new ApiException("Problem communicating with API", e);

Asked by: Roman815 | Posted: 24-01-2022

Answer 1

The charset is defined in the page itself:

<meta http-equiv="Content-Type" content="text/html; charset=gb2312" /> 

In general, there are 3 ways to specify the encoding of an HTTP-server HTML page:

Content-Type header of HTTP

Content-Type: text/html; charset=utf-8

Encoding pseudo-attribute in the XML declaration

<?xml version="1.0" encoding="utf-8" ?>

meta tag inside head

<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />

see Character Encodings for details

So you should try to evaluate each possible declaration in order to find the appropriate encoding. You could try to parse a page with utf-8 and restart if you encounter the Content-Type declaration meta tag.

Answered by: Arthur260 | Posted: 25-02-2022

Answer 2

Try the GuessEncoding library. It's not 100% bullet proof but can help a lot of times.

Answered by: Thomas99 | Posted: 25-02-2022

Similar questions

xml - How to implement characters method using SAXParser on Android

I am parsing xml using the SAXParser and want to know if this is the right way to implement the characters method. Assume there's a class-level String variable named elementValue and it is initialized to "" in the startElement method. Here is the characters method: @Override public void characters(char[] ch, int start, int length) { String charsToAppend = new String(ch, start, length); ...

filenames - What characters allowed in file names on Android?

What special characters are allowed for file names on Android? ~!@#$%^&amp;*()_+/\., Also, can I save file with Unicode name?

Android: How to read a txt file which contains Chinese characters?

i have a txt file which contains many chinese characters, and the txt file is in the directory res/raw/test.txt. I want to read the file but somehow i can't make the chinese characters display correctly. Here is my code: try { InputStream inputstream = getResources().openRawResource(R.raw.test); BufferedReader bReader = new BufferedReader( new InputStreamReader(inputstream,Charset.forName("U...

Android Emulator Has Chinese Characters

I recently started using eclipse and an AVD emulator to develop android apps for a droid. Everything is going fine, except when i type in the emulator it returns chinese characters. My location is set to en-us, so not sure what is going on. Any thoughts?

android - How do I get rid of the Chinese and Japanese characters on my AVD 2.1/2.2 Keyboard?

Can anyone tell me how to change the Language &amp; Keyboard start-up settings?

How to restrict special characters from an Android EditText field?

How to restrict special characters from an Android EditText field?

Android EditText: Listeners, Newline Characters, and Focus

Currently I have an edittext field that when the user presses enter it does mostly what I want it to, validate an IP Address format and inform the user if it is wrong. How do I make it so when the user presses enter it checks it like it is supposed to be does NOT enter the newline character? Here is my code for it. public boolean onEditorAction(TextView arg0, int arg1, KeyEvent arg2) { ...

How do I display Greek characters in Android?

How do I display Greek characters in Android?

android - error when saving a java file ( Some characters can not be mapped using Cp1252 character encoding)

When I am tring to save a java file, an error occured and and I can not save the file. Error is : "Some characters can not be mapped using Cp1252 character encoding." My Code is : package; import; import android.content.SharedPreferences; import android.os.Bundle; import android.text.Editable; import android.text.TextWatc...

Display unicode characters in TextView Android

There are a number of posts all over the internet on this topic, however none of them have been able to isolate and solve the problem. I am trying to show some special UTF-8 encoded symbols stored in a SQLite database using a TextView, however all it shows is boxes. I understand what this means is that right font is not installed. But when I print those symbols using Arial font on Mac it works. I am trying ...

Still can't find your answer? Check out these communities...

Android Google Support | Android Community | Android Community (Facebook) | Android