Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to convert from Windows1251 String to UTF String? #18

Open
rpungin opened this issue Sep 13, 2023 · 1 comment
Open

How to convert from Windows1251 String to UTF String? #18

rpungin opened this issue Sep 13, 2023 · 1 comment

Comments

@rpungin
Copy link

rpungin commented Sep 13, 2023

This is more of a question rather than an issue, but hopefully someone can answer me.

I am loading the HTML from a Russian webpage using http package into a String variable like so:

String html = await http.Client().get(Uri.parse(url)).body;

The website encoding is Windows1251. So for example html variable can have text such as "Êàêèå". This is what I see when I print the variable.

So my question is: How do I convert that string to Cyrillic characters in Unicode encoding which should result in "Какие"?

I tried this:

import 'dart:convert';
import 'package:enough_convert/enough_convert.dart';

void main() {
  final html = "Êàêèå";
  final encoded = const Windows1251Codec().encode(html);
  final converted = const Utf8Codec().decode(encoded);
  print(converted);
}

But I get an error on this line: final encoded = const Windows1251Codec().encode(html);:

FormatException: Invalid value in input: "Ê" / (202) at index 0 of "Êàêèå"

Essentially what I would like to do is to convert "Êàêèå" to "Какие". You can do this on the website https://convertcyrillic.com. Here is the screenshot:

Screenshot 2023-09-13 at 12 57 32 PM

So how do I do this programmatically in Dart?

@rpungin
Copy link
Author

rpungin commented Sep 20, 2023

After extensive googling I found the solution:

Turns out I have to encode the string using Latin1 encoder first, and then decode it using Windows1251 codec:

import 'package:enough_convert/enough_convert.dart';

void main() {
  final html = "Êàêèå";
  final encoded = const Latin1Codec().encode(html);
  final converted = const Windows1251Codec().decode(encoded);
  print(converted);
}

This prints out "Какие"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant