Go Back   CodingForums.com > :: Server side development > Java and JSP

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 12-15-2012, 08:35 PM   PM User | #1
ANewCoder
New to the CF scene

 
Join Date: Dec 2012
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
ANewCoder is an unknown quantity at this point
Using Java to Gather Information from a Website

I'm a relatively new programmer and need help with Java and HTTP.

I'm working on a pet project right now where the goal is to have the user input a word (theoretically in any language but I mainly have Russian in mind) and then the program would access the wiktionary website where it provides a conjugation of the verb. For example, if I were to input the Russian word хотеть (to want), my program would crawl the wiktionary webpage and print many of the forms of the verb such as past tense, present tense, and future tense. However I have run into a small problem: basically all the Russian words returned are only shown as "????????". How do I retrieve the actual Russian words?

I've already written some code mainly using the JSoup Package. Any help would be appreciated, thanks.
public static void main(String[] args) throws IOException {
Scanner s = new Scanner(System.in);
System.out.print("Enter website:");
String link = s.nextLine(); //The website you want
link = "http://" + link;
try {
Document doc = Jsoup.connect(link).get();
Elements text = doc.getElementsByTag("tr");//All the information is embedded in a table
System.out.println("--------------------");
for (Element item: text) {
System.out.println(item.text());
}

}
catch (IOException e) {
System.out.println(e.getMessage());
}

}
ANewCoder is offline   Reply With Quote
Reply

Bookmarks

Tags
http, java, jsoup

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 01:15 AM.


Advertisement
Log in to turn off these ads.