...

View Full Version : (How to) Copy/Manipulate all parsed output/displayed text for use as real text?



Ace.....
11-03-2012, 04:36 PM
This appears to be a major challenge for javascript programmers. :eek:

Apparently, there are no universal references to nodes further down the DOM list than the root node. You have to traverse the DOM, using a tree-navigating algorithm, until you find what you're looking for.

YET, we can simply drag our mouse across a web page of text, highlight that text, copy it, and post it to a text file........ so easy we don't think about it.

Therefore:
Is it possible to NOT traverse the DOM, but instead....


.... grab everything from the 1st to the last part of the display....
.... and paste it as text ....

In effect, a function to mimic the 'highlight, copy, & paste, commands': Where everything on a page is copied, but only the text gets pasted to text?

Notes:

For this application all the display pages would be the same, other than the changing text (arriving from google's translation server).
I understand that all browsers may be slightly different.
Clearly a universal solution would be ideal, but one for Chrome would be a good start.


I am confident that there will be a simple way to do this, however I have yet to discover a programmer who knows that method (and for me, I lack the in depth experience needed, to generate a solution to this apparently fundamental problem).

I found you via:
http://www.javascriptkit.com/domref/

There is a direct link to Coding Forums :)

VIPStephan
11-03-2012, 05:35 PM
This appears to be a major challenge for javascript programmers. :eek:

Apparently, there are no universal references to nodes further down the DOM list than the root node. You have to traverse the DOM, using a tree-navigating algorithm, until you find what you're looking for.

YET, we can simply drag our mouse across a web page of text, highlight that text, copy it, and post it to a text file........ so easy we don't think about it.

I admit that I wouldn’t know a solution either and sorry for kind of hijacking the thread but do you have any idea what’s happening in the programs’ background while you’re just highlighting, copying, and pasting the text without thinking about it? How do you come to the conclusion that it would be any simpler than a JavaScript DOM traversal?



I am confident that there will be a simple way to do this, however I have yet to discover a programmer who knows that method (and for me, I lack the in depth experience needed, to generate a solution to this apparently fundamental problem).


Well, depending on how many programmers you’ve got to know so far it seems to me that apparently there isn’t such a simple solution for this after all if it’s so hard to find one who knows any that meets your expectations. So, what makes you so confident that there is a JS solution other than traversing the whole document tree?

Again, sorry to get into a fundamental debate but I want to find the reasoning for your clear requirements of a non-traversal solution.

Logic Ali
11-03-2012, 06:56 PM
I just threw this together to try to retrieve all visible text. It could be substantially refined, but seems to work if run as the last item in the document.
Presumably you have the server-side code to retrieve data from another domain.


<script type='text/javascript'>

var e = document.getElementsByTagName('*'),
t = '',
tagElem,
nodes,
cn;

for( var i = 0; i < e.length; i++ )
{
tagElem = e[ i ];
nodes = tagElem.childNodes;

if( !/SCRIPT/i.test( tagElem.nodeName ) )
for( var j = 0; j < nodes.length; j++ )
if( ( cn = nodes[ j ] ).nodeType == 3 )
t += ' ' + cn.textContent;
}

alert(t)

</script>

Alternatively, there may be a solution using document.execCommand.

Ace.....
11-03-2012, 08:13 PM
Many thanks for the response:


do you have any idea whatís happening in the programsí background while youíre just highlighting, copying, and pasting the text without thinking about it?

No..... I truly do not know what is happening, however....
.... The operation of highlighting is ancient (in computing terms), hence, in response to your next question:



How do you come to the conclusion that it would be any simpler than a JavaScript DOM traversal?

I presume (rightly or wrongly - please advise) that there exists, and has existed from an early stage of browser development, a simple command that allows 'displayed text' ie. unwritten text - meaning (say) innerHTML; to be referenced/grabbed, and used as genuine text.

Example:

pg 1 contains text in a text area.
I reference that text thru its path:


var pg_1 = document.getElementById("textarea_id").value;

The text exists as characters viewable by viewing source.
I now store that text:


localStorage.pg_1_text=pg_1;

I then display this text on another page entirely pg_2:


document.getElementById("result").innerHTML="<pre>" + localStorage.pg_1_text + "</pre>";

View source shows only the script, YET you can highlight the text, copy and paste.

This tells me that something very simple is happening (but I could be wrong).
Simple, because this highlighting, seems to be fundamental to human-pc interaction, and therefore was included as part of the basic structure of browsers, from the beginning (and since the beginning of the mouse, at least).

From a debate perspective:
Would the act of highlighting 'displayed text' be prone to all the divergent possibilities of programming, when ultimately, the actual display of text seems to be 'fundamental'..... all else can change, but display of text (and the highlighting of it) remains constant.

Logically, therefore, I assume that this places 'displayed text' as a primary function.

Therefore, I'm asking, whether it is possible to access this (supposed) primary function, rather than traversing the DOM - it is just a thought process that can be discounted.

But....

Discounted because, for example, this operation has been blocked due to security constraints OR Javascript cannot access this primary function.

I'm just explaining my thinking here - don't think I'm being a ttwwaatt.



Well, depending on how many programmers youíve got to know so far it seems to me that apparently there isnít such a simple solution for this after all if itís so hard to find one who knows any that meets your expectations. So, what makes you so confident that there is a JS solution other than traversing the whole document tree?

This is a fair point you make.

My honest answers are:

In my experience of life, often (but not every time) - by re-stating the problem, referencing similar scenarios; solutions manifest themselves - hence why I talk about ancient highlighting of text - I never see it mentioned in these topics.
Google staff have found a standard way to reference/grab displayed text - send it to their servers, translate it, and send it back.

AND they can grab the entire page, translate it and send it back (with formatting/styles/functions/everything).

This makes me think that 'Joe Public' top programmers can do the same.

That is my reasoning for the possibility of a non-traversal solution.


Thanks again for the response.
The grilling is fair.

Like I said...... I'm not a ttwwaatt.
I have some ideas/leads to follow.

This was the start of the thread.
Maybe we can do something.

@Logic Ali Thanks also for the response.
It's late(ish) - we've gotta eat - maybe somebody else interested in this thread can take your script to pieces and give a view on it?

I'll also post my script leads later.

Thanks again to everybody interested in solving this problem (that so few appear to have solved)

:thumbsup:

rnd me
11-03-2012, 11:08 PM
have you played with String(getSelection()) ?

also, for any given dom element, element.textContent will produce the same text as a clipboard copy of that element when highlighted.

rnd me
11-03-2012, 11:11 PM
I just threw this together to try to retrieve all visible text. It could be substantially refined, but seems to work if run as the last item in the document.
Presumably you have the server-side code to retrieve data from another domain.


<script type='text/javascript'>

var e = document.getElementsByTagName('*'),
t = '',
tagElem,
nodes,
cn;

for( var i = 0; i < e.length; i++ )
{
tagElem = e[ i ];
nodes = tagElem.childNodes;

if( !/SCRIPT/i.test( tagElem.nodeName ) )
for( var j = 0; j < nodes.length; j++ )
if( ( cn = nodes[ j ] ).nodeType == 3 )
t += ' ' + cn.textContent;
}

alert(t)

</script>


that will dredge up <script>, <iframe>, <noscript>, and <style> tag text, not cool.

if you want visible text, at least start in document.body instead of the HTML element...

first loop through and run element.parentNode.removeChild(element) on every script and style tag before you grab the text.

Logic Ali
11-03-2012, 11:39 PM
that will dredge up <script>, <iframe>, <noscript>, and <style> tag text, not cool.






if( !/SCRIPT/i.test( tagElem.nodeName ) )

I used this to suppress script tags, with the option of adding any others as required.

felgall
11-04-2012, 12:27 AM
The code I used the last time I wrote a script that needed to access all the text in the web page was:


var node, txtnodes;
nodewalk = function(node, str) {
if (typeof str != 'array') str = [];
for (var i = 0; i < node.length; i++) {
if (node[i].hasChildNodes() && 'SCRIPT' !== node[i].nodeName)
str = nodewalk(node[i].childNodes,str);
if (3 === node[i].nodeType)
str.push(node[i]);
return str;
}
txtnodes = nodewalk(document.getElementsByTagName('body')[0]);

If you don't want to include the alternate text for anyone whose browser doesn't support iframes then you'd add && 'IFRAME' !== node[i].nodeName after the test for scripts. Styles don't go in the body so that wouldn't be a problem and <noscript> has been dead since the DOM was implemented to replace it so that shouldn't be a problem either (but if you do still have antiquated code that uses it you can skip its content the same was as for script and iframe)

I haven't come across a browser where the DOM doesn't return the nodeNames in uppercase for a page served as HTML and so have never bothered using a regular expression to make it insensitive to case. If the page were XHTML then the nodeNames would be lowercase but then you'd need to replace other parts of the code as well.

Ace.....
11-05-2012, 03:34 PM
I just threw this together to try to retrieve all visible text. It could be substantially refined, but seems to work if run as the last item in the document.
Presumably you have the server-side code to retrieve data from another domain.


<script type='text/javascript'>

var e = document.getElementsByTagName('*'),
t = '',
tagElem,
nodes,
cn;

for( var i = 0; i < e.length; i++ )
{
tagElem = e[ i ];
nodes = tagElem.childNodes;

if( !/SCRIPT/i.test( tagElem.nodeName ) )
for( var j = 0; j < nodes.length; j++ )
if( ( cn = nodes[ j ] ).nodeType == 3 )
t += ' ' + cn.textContent;
}
alert(t)
</script>

Alternatively, there may be a solution using document.execCommand.

I've just run a test using Logic Ali's code, involving 3 pages, using onclick to launch the functions:
It definitely works, though we lose the line breaks.
I ran it with output to <pre> (local_stor_3.html) and without <pre>.

Interestingly with <pre> the text is written low down the page - don't know why.

With <pre> removed, the text is written at the top beneath the typed text.

While this has to be seen as a success; from a readers perspective, line breaks are critical.

When highlighting, copying, and pasting to text (a web page); the text pasted does contain the line breaks (or perhaps it recognises paragraph tags).

I'll now try felgall's code. :)

local_store_1.html

<!DOCTYPE html>
<html>
<head>
<script>
function store_pg_1()
{var pg_1 = document.getElementById("styled").value;
localStorage.pg_1_text=pg_1;}
</script>
</head>
<body OnLoad="document.myform.styled.focus();">
<div id="result">
<form name="myform">
<textarea name="styled" id="styled" onclick="store_pg_1()"> </textarea>
<br><br>
<input type="text" name="txt3" id = "Nstyled" value="input text3" onclick="store_pg_1()"><br>
</form>
</div>
</body>
</html>

local_store_2.html

<!DOCTYPE html>
<html>
<head>
<script>
function store_pg_2()
{
var e = document.getElementsByTagName('*'), t = '', tagElem, nodes, cn; for( var i = 0; i < e.length; i++ ) { tagElem = e[ i ]; nodes = tagElem.childNodes; if( !/SCRIPT/i.test( tagElem.nodeName ) ) for( var j = 0; j < nodes.length; j++ ) if( ( cn = nodes[ j ] ).nodeType == 3 ) t += ' ' + cn.textContent; }
localStorage.pg_2_text=t;
}
</script>
</head>
<body onclick='store_pg_2()';>
<div id="stuff">
<p>Actual typed text<br>line break, actual typed text</p>
</div>
<div id="result">
<script>
if(typeof(Storage)!=="undefined")
{document.getElementById("result").innerHTML="<pre>" + localStorage.pg_1_text + "</pre>";}
else
{document.getElementById("result").innerHTML="Sorry, your browser does not support web storage...";}
</script>
</div>
</body>
</html>
local_store_3.html

<!DOCTYPE html>
<html>
<body>
<div id="stuff">
<p>More typed text here<br>break more text here also</p>
</div>
<div id="result">
<script>
if(typeof(Storage)!=="undefined")
{document.getElementById("result").innerHTML="<pre>" + localStorage.pg_2_text + "</pre>";}
else
{document.getElementById("result").innerHTML="Sorry, your browser does not support web storage...";}
</script>
</div>
</body>
</html>

rnd me
11-05-2012, 03:42 PM
you can replace "</p>" with "</p>\n" and "<br>" with "\n" to get the line breaks back.
maybe do the same for "</div>" or whatever blocks your content uses.

a little text transformation can go a long way.

if you are using text, you want <pre>, just " str".trim() it to remove leading whitespace.

i prefer "whitespace: pre-wrap" or "pre-line", since i hate scrollbars...

rnd me
11-05-2012, 03:57 PM
Styles don't go in the body so that wouldn't be a problem and <noscript> has been dead since the DOM was implemented to replace it

two things:
1. styles DO go in the body, that's what the scoped attrib is for. i will buy and ship you a great little book on html5, just PM me, but you can and should review the spec that's getting used by all major browsers. just sayin...

2. how can the dom replace noscript in a browser without javascript?

Ace.....
11-05-2012, 04:08 PM
The code I used the last time I wrote a script that needed to access all the text in the web page was:


var node, txtnodes;
nodewalk = function(node, str) {
if (typeof str != 'array') str = [];
for (var i = 0; i < node.length; i++) {
if (node[i].hasChildNodes() && 'SCRIPT' !== node[i].nodeName)
str = nodewalk(node[i].childNodes,str);
if (3 === node[i].nodeType)
str.push(node[i]);
return str;
}
txtnodes = nodewalk(document.getElementsByTagName('body')[0]);

If you don't want to include the alternate text for anyone whose browser doesn't support iframes then you'd add && 'IFRAME' !== node[i].nodeName after the test for scripts. Styles don't go in the body so that wouldn't be a problem and <noscript> has been dead since the DOM was implemented to replace it so that shouldn't be a problem either (but if you do still have antiquated code that uses it you can skip its content the same was as for script and iframe)

I haven't come across a browser where the DOM doesn't return the nodeNames in uppercase for a page served as HTML and so have never bothered using a regular expression to make it insensitive to case. If the page were XHTML then the nodeNames would be lowercase but then you'd need to replace other parts of the code as well.

For some reason, I failed to get this code to work. :confused:
It is missing a '}'

Here is the <head> of page 2:


<head>
<script>
function store_pg_2()
{
var node, txtnodes; nodewalk = function(node, str)
{ if (typeof str != 'array') str = []; for (var i = 0; i < node.length; i++)
{ if (node[i].hasChildNodes() && 'SCRIPT' !== node[i].nodeName) str = nodewalk(node[i].childNodes,str); if (3 === node[i].nodeType) str.push(node[i]);
return str;
}
txtnodes = nodewalk(document.getElementsByTagName('body')[0]);
localStorage.pg_2_text=txtnodes;
}
</script>
</head>

I figure the '}' should follow 'str.push(node[i]);'.

But when placed there, no text is written into page 3.

It doesn't say 'undefined' like when the bracket is placed elsewhere.

Anybody any thoughts?

Ace.....
11-05-2012, 05:27 PM
you can replace "</p>" with "</p>\n" and "<br>" with "\n" to get the line breaks back.
maybe do the same for "</div>" or whatever blocks your content uses.

a little text transformation can go a long way.

if you are using text, you want <pre>, just " str".trim() it to remove leading whitespace.

i prefer "whitespace: pre-wrap" or "pre-line", since i hate scrollbars...

At the moment, the 3 page test, is to create some displayed text in page 2. using innerHTML, and confirm it can be made to appear in page 3 correctly.

The actual principal being:
To have the 3 pages (as iframes) in a container page.


I type text into page 1. textarea (purely using enter to create line breaks.

This text appears in page 2. where google-translate will read it, translate it, and send it back as 'displayed text'

I then want to transfer that displayed text to page 3. maintaining the line breaks.

Question
Do I somehow swap the line breaks for <br> before it gets written as innerHTML (in page 2) ?

Ie. Do NOT write the innerHTML into <pre></pre>.
Instead, the displayed text would be line separated by <br>.
It would then, presumably be returned by google with the <BR>'s intact.

Or is there a better solution?

Ace.....
11-05-2012, 07:15 PM
Question
Do I somehow swap the line breaks for <br> before it gets written as innerHTML (in page 2) ?


Actually <br> doesn't work.
I've just tested it.

So.....
... using Logic Ali's script:

It reads all the displayed text on page 2, and passes it to the variable 't'.
This gets stored in localStorage under the name pg_2_text.

I then use the code on page 3:


document.getElementById("result").innerHTML= "<pre>" + localStorage.pg_2_text + "</pre>";}

This works for everything typed into the page 1 textarea. Ie. line breaks are passed thru page 2, to page 3.

This actually was/is the objective. :thumbsup:

What I need to do next, is test this with google translate.

Apologies to everybody for the confusion. :o
In mitigation: It's so easy to lose track of tests, what's been changed, and were the tests consistent in the first place?

I think.... before any further mods are effected (if any are required), a test with google translate should be effected.

So, apart from my human failings....... this is looking to be an awesomely powerful script. :D

Fingers crossed that everything comes back from the google servers, as is needed.

rnd me
11-06-2012, 12:05 AM
if this is something that's going to be saved, it better to use <textarea>.value so people can cut and paste and to prevent the browser from wrongly fixing certain unicode chars.

Ace.....
11-06-2012, 01:49 AM
if this is something that's going to be saved, it better to use <textarea>.value so people can cut and paste and to prevent the browser from wrongly fixing certain unicode chars.

The major problem with textarea is that g-translate doesn't see it.
I think it assumes that as 'user input' it has no right translating it.

So:
With all the thinking and research that's been going on..... it has led me to believe that the simplest form is going to be best:


var text = document.body.innerText;
localStorage.pg_2_text=text;

This standardises everything, and enables simple comparative testing for state of change (which will be needed as the text changes from English to French.

It works..... I've tested it.



I can write into the textarea of page 1
localStorage it as page_1_text

Refreshing page 2 causes page_1_text to be written as innerHTML <pre>(will try innerText tomorrow).

Google auto reads the page & sends back its translated text as innerHTML.

I can click the text, and the translation is localStorage(d) as page_2_text.

Refreshing page 3 causes page_2_text to be written as innerHTML<pre>.


It seems perfect!

We've come a long way around the houses.... but I guess: that is what it's all about. :)

I know what you're saying (about the universality) but I think that this might be a good place to move on to other interesting aspects (with another thread..... so if improvements can be made on this current issue then fine).

There is the question of 'checking the state of page 2..... when it's changed to (say) French, it will need to be stored, and page 3 reloaded.

So this is gonna require an 'onEnter' event to complete a paragraph.
The whole text then passed via localStorage & refresh to page 2.

An indeterminate number of milliseconds later, the text will change to French, and will require passing to page 3.

LocalStorage can handle the passing of data from page to page.
What's gonna be interesting is the sequence of events that occur when the author hits the 'Enter' key (to complete a paragraph).

Clearly page 2 has to refresh AND (I think) start a compare loop, checking the page 2 innerHTML with the page_1_text variable.
When it changes..... stop comparing, and pass it to page 3.

Ok... so I've just started a new thread on the old thread.
Sorry... it's nearly 3:00am.
I think I'll chuck the towel in for tonight. ;)

Ace.....
11-06-2012, 01:01 PM
Actually I think I'm gonna use textContent throughout, to replace innerText & innerHTML.

This creates a standard text based system.

Once this is fully functional, it will be worth looking again at incorporating Ali's script, which is more geared for wysiwyg web editing, with 2 monitors, or a big widescreen.

Sure it would be better, cos it would do everything, but keeping it simple to get the tool finished, is I reckon, a good idea. :)

Ace.....
11-06-2012, 05:23 PM
Actually I've tested in Firefox, and it prefers innerHTML.
So I've reinstated that across the board.

I guess the moral of the story is that there are few genuine standards. :(

:)

Ace.....
11-07-2012, 04:05 PM
Actually, the moral of the story is that there are no standards.

It is not just a matter of sniffing to present innerText or textContent.
textContent doesn't provide line breaks.

I found a v.good script to deal with this:

http://clubajax.org/examples/plain-text-vs-textcontent-vs-innertext/

A real tour de force.

It may be that I need to use this script.
I noted there were a couple of comments at the bottom re: some minor issues.

I need to first review each of the text transfers.
At the moment, google are returning my line broken text, without the line breaks. :rolleyes:

rnd me
11-07-2012, 11:53 PM
fun fact: in chrome, those linebreaks in textContent might come back if you first set the container's white-wpace to "pre" or "pre-wrap" or "pre-line". Surprisingly, this affects textareas as well; a fact which burned me for years before i figured out what was going on.


silly question comes to mind:
after hearing about your linebreak and dom issues, i have to ask; are you using google's translate API?

if so, it expects plain text, and it preserves line breaks just fine. You would use a textarea to send a string back and forth to google. This works simply and perfectly.


If you are using some kind of hack where google is trying to translate an HTML page you publish, that's going to be a lot trickier and google can change their formatting at the drop of a hat, breaking your script.

bottom line, use APIs and you will have FAR fewer inconsistencies like you've noted.

Ace.....
11-09-2012, 05:15 PM
Sorry for the delay in responding.
After seeing that the line break problems revealed themselves during the operations that passed the values back and forth, I decided to separate the problems.

Script to pass values
Script to deal with the values


The former being discussed here:
http://www.codingforums.com/showthread.php?p=1290393

The objective being to create a system that can pass the variables from start to finish.
With that in place, the different variables can then be experimented with.

On that issue, it may be, that the sensible route is to pass for translation everything (Google seems to handle it well), but, having said that....
.... Let's not forget that the text IS being entered into a textarea - so: text and line breaks.

Anyway, the script to pass the variables is progressing nicely. Possibly only one more major bridge to cross, before light becomes visible. :)

However you do raise a couple of very good points:


If you are using some kind of hack where google is trying to translate an HTML page you publish, that's going to be a lot trickier and google can change their formatting at the drop of a hat, breaking your script.


This IS a concern.
It is not so much a question of hacking..... though to be fair it may be.
Almost everything works with google published code, that is provided for their users.

Ie. If they change their published commands, then everybody on the planet is gonna have to re-install the 'two part codes' (head & body).

However, there are no event handlers (it would seem) that can cater for our specific needs.
It may be that we need to extract a 'script status' (gleaned from the chrome log), a number of which look to be very standard, and unlikely to change more than might be expected.

That then brings your other point re API



silly question comes to mind:
after hearing about your linebreak and dom issues, i have to ask; are you using google's translate API?

if so, it expects plain text, and it preserves line breaks just fine. You would use a textarea to send a string back and forth to google. This works simply and perfectly.

bottom line, use APIs and you will have FAR fewer inconsistencies like you've noted.


Hmmm!
That is very different to how g-translate works at a user/webmaster level.
It does not read textareas.

Yet even if it did...... the key to this system is that the translated text is then passed back to google.
So the 'translation done' event must still be recognised.

I'll definitely have a look into the API affair.
Perhaps it provides just those tools I'm looking for. :thumbsup:



fun fact: in chrome, those linebreaks in textContent might come back if you first set the container's white-wpace to "pre" or "pre-wrap" or "pre-line". Surprisingly, this affects textareas as well; a fact which burned me for years before i figured out what was going on.


Thanks for that; that info could prove very useful considering the problems I was having in that area. :)

Ace.....
11-09-2012, 06:16 PM
are you using google's translate API?

Re: my last post, one of the things I said I'd do, was to look at google API.

I've just done so.

To be honest...... it seems like another mountain to climb, when I'm already close to peaking this one (thanks to community support).

It's not that it's a 'paid for club' per se, but it involves setting up payment accounts on usage that will be beyond my control.
Like if some clever sod sets a character feed to your input (just for fun). :eek:

Then on top of that there is all the coding simply to deal with authorisation etc. before beginning to find out whether, in their library, some other bod has developed the script that is anyway required.
(and if not, it will have to be anyway developed)

I think it was worthwhile looking at it as an option, but, at the moment, the project is advancing well, and everybody following it is potentially getting something out of it.

I think I'll stick with codingforums.com for the foreseeable future. ;)

Ace.....
11-09-2012, 11:32 PM
fun fact: in chrome, those linebreaks in textContent might come back if you first set the container's white-wpace to "pre" or "pre-wrap" or "pre-line". Surprisingly, this affects textareas as well; a fact which burned me for years before i figured out what was going on.


Who would have thought that it could be such a hassle to provide word breaking between words? :confused:

I've spent a few hours trying to tidy the presentation, and progress has been made.
This is a good reference on the points you mention (and everything):
http://www.w3.org/TR/css3-text/

The only problem is going to be compatibility.
I'm testing between Firefox and chrome at the mo, cos I can't be bothered to boot into windows. :rolleyes:

Each handles pre differently in HTML but perhaps in css they may be more unified. :)

Ace.....
11-10-2012, 12:47 AM
Who would have thought that it could be such a hassle to provide word breaking between words? :confused:


At last, a major breakthrough with presentation. :D

You are forced to use:

Inline-styles!!!

The penny dropped when I could see the English copy appear on page perfectly, only to be replaced by the translation, all collapsed.

I'd been trying every word break rule that W3c has to offer (and they have loads BTW), but I'd been using a style sheet.

Style sheets, as I've discovered, just don't work with post-load, written-in HTML.

For crying out loud...... All the frigging posts and articles I've read; and none of them made reference to inline-styles!

It's a great victory (for perseverence).

Style-sheets are useful, but without adequate knowledge, they suck you into complacency.

A daft bit of knowledge, having such an impact! :eek: :) :D

Everything is possible now......Wow!

Ace.....
11-11-2012, 05:25 PM
Okay....... a working version at last. :D

There are still display issues, but the major problems were cleared up after the tip by rnd me, that led me to discover the need for inline styles. :thumbsup:

Immediate display problems relate to:

Establishing the workspace height for any screen/browser

I finally went with pixels, just to get the system working, but by zooming out, the workspace is wasted.

There are I'm sure, ways around this....... It does look a little noobish, by failing to provide scalable height. :(

Line spacing - Pasted web site text is more difficult to read

When typing text, one would automatically hit enter twice to insert 2 linebreaks (as per this post for paragraphs).
Therefore typed text is perfect(?)

When pasting a web page contents, typically only one linebreak is used, because in HTML, the additional spacing is automatically provided.

I don't see any way around this other than by human intervention.
Save to a text file, and literally add line breaks for clarity, and then use for translation.

Browser Wars
I've yet to test the prog on anything other than Chrome & Firefox.
Both will be very up to date cos I run Ubuntu 12.04.

There are sure to be issues with IE and older kit, but we'll see.
You may well discover a few gaping holes in it (but that's good). ;)

Anyway, here it is: (http://www.max-haut-debit.fr/translator/translator_page.html)
(read thru the short instructions first, to learn the best way forward) ;)
Edit:
Also a minor problem with the Google inserted header.
It is cut off at bottom.
zoom out zoom in and the problem disappears.

Ace.....
11-12-2012, 03:26 PM
I just ran it in iexplorer 9.
Pretty disastrous. :(

Not only are the frames not displaying as desired, but the line breaks are not functioning.

What is the general thinking now, vis a vis two sets of code: FFox/Chrome & iExplorer?

Obv.... I understand that it means 2 updates.
Yet that might be a price to pay, in order to avoid the extreme hassle surrounding browser compatibility.

Or is the way forward, different sections of code, that will be read, if a browser is x or y?

Or have I just failed to implement the correct code, and hence the iE9 fail?

xelawho
11-12-2012, 05:07 PM
the problem I believe is that a textarea uses \n to denote a line break whereas innerHTML uses a <br>

IE seems to be the only one that is really fussy about this, but you can fix it by including a regex to replace those \n's in the function on your "start" page:



function translateIt(text) {
var text=text.replace(/\n/g,"<br>")
parent.frames['theframe'].googleTranslateElementInit(text);
}

Ace.....
11-12-2012, 05:21 PM
...
Internet Explorer 9.......... Sorted! :cool:

Okay the display problems in iE9 relate to the need for a doc declaration.

Originally I had been using:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
I then removed it because I was using new html5 declarations.

Apparently, FFox & Chrome can manage without a doc declaration.
Research informed me that iE8 & above require it.

So I chose the new HTML5 declaration, which is incredibly simple, and is supposedly backwards compatible (I read).


<!DOCTYPE html>

I also learned that I must clear the web browser cache every time I run a test.
I discovered this, when FFox didn't like my // method to disable certain style declarations.
I deleted them completely, yet FFox console still found them, even after closing the tab and starting again.

Actually, I have found FFox console to pick up far more errors than Chrome console.
It reminded me I must declare the character encoding (which I did).

With trepidation I loaded the sys in iE9, and YES!
It displayed seemingly perfectly.

AhHa......... so maybe, with good coding, the 3 main religions can be catered for in a single document. ;)

What a relief (you can imagine).
I've gotta test it a bit more but so far so good.

:D



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum