View Single Post
Old 12-15-2012, 08:35 PM   PM User | #1
ANewCoder
New to the CF scene

 
Join Date: Dec 2012
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
ANewCoder is an unknown quantity at this point
Using Java to Gather Information from a Website

I'm a relatively new programmer and need help with Java and HTTP.

I'm working on a pet project right now where the goal is to have the user input a word (theoretically in any language but I mainly have Russian in mind) and then the program would access the wiktionary website where it provides a conjugation of the verb. For example, if I were to input the Russian word хотеть (to want), my program would crawl the wiktionary webpage and print many of the forms of the verb such as past tense, present tense, and future tense. However I have run into a small problem: basically all the Russian words returned are only shown as "????????". How do I retrieve the actual Russian words?

I've already written some code mainly using the JSoup Package. Any help would be appreciated, thanks.
public static void main(String[] args) throws IOException {
Scanner s = new Scanner(System.in);
System.out.print("Enter website:");
String link = s.nextLine(); //The website you want
link = "http://" + link;
try {
Document doc = Jsoup.connect(link).get();
Elements text = doc.getElementsByTag("tr");//All the information is embedded in a table
System.out.println("--------------------");
for (Element item: text) {
System.out.println(item.text());
}

}
catch (IOException e) {
System.out.println(e.getMessage());
}

}
ANewCoder is offline   Reply With Quote