Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.

Due to spam on this forum, all posts now need moderator approval.

 Entire forum ➜ SMAUG ➜ SMAUG coding ➜ Large text files

Large text files

It is now over 60 days since the last post. This thread is closed.     Refresh page


Posted by Nnewbie   (5 posts)  Bio
Date Mon 14 Jul 2003 08:00 PM (UTC)
Message
I am using the FileSystemObject to read from a text file and write to another text file. I am using the read(n) method of the Textstream object for optimal performance. The program works for files up to 4 megabytes but does not work for files larger than this. How can I overcome this? thanks in advance. the code:


Sub Prgm_Parse8

const ForReading = 1
const TristateFalse = 0
dim sName
dim outputFile
dim fso
dim objFile
dim objTS
set fso = CreateObject("Scripting.FileSystemObject")
sName = "W:\Copy of 2L14_9C016_B.wrl"
set objFile = fso.GetFile(sName)
set objTS = objFile.OpenAsTextStream(ForReading, TristateFalse)

'***********-----
contents = objTS.Read(objFile.Size)

dim fFile, contents, aFile, iLine, iLines, sLine, sWord

'loading file into a string
set fFile = fso.OpenTextFile( sName, 1 ) '1 = ForReading
'fFile.Close

'splitting string into an array
aFile = Split( contents, vbCrLf )

'scanning lines
iLines = UBound( aFile ) + 1
iLine = 0
do while iLine < iLines
sLine = aFile( iLine )

dim c
c=Left(sLine,1)
While c=Chr(9) or c=" "
sLine=Mid(sLine,2)
c=Left(sLine,1)
Wend
sWord=Left(sLine,3)

if ( sWord = "DEF" ) then

str1 = left(sLine, 5)
str2 = mid(sLine, 7)
str3 = UCase(Mid(sLine, 6, 1))

dim check_for_num
check_for_num = IsNumeric(str3)

if (check_for_num = true) then
sLine = str1 + "X" + str2
else
sLine = str1 + str3 + str2
end if

dim position

position = instr(sLine, ("Separator"))
position = (position - 5)
sLine = mid(sLine, 5, position)
sLine = CleanString( sLine, Chr(34) & "/,.*-&:" & " ")
str1 = left(str1, 4)
sLine = str1 + sLine + " Separator {"
end if

if ( sWord = "USE" ) then

str1 = left(sLine, 5)
str2 = mid(sLine, 7)
str3 = UCase(Mid(sLine, 6, 1))

'dim check_for_num
check_for_num = IsNumeric(str3)

if (check_for_num = true) then
sLine = str1 + "X" str2

else
sLine = str1 + str3 + str2

end if

sLine = mid(sLine, 5)
sLine = CleanString( sLine, Chr(34) & "/,.*-&:" & " ")
str1 = left(str1, 4)
sLine = str1 + sLine
end if

'- put it back in array with
aFile( iLine ) = sLine

iLine = iLine + 1
loop

'rejoining array items into a string, writing to output file
contents = Join( aFile, vbCrLf )

set fFile = fso.CreateTextFile( sName, True )
fFile.Write contents

fFile.Close

end sub

function CleanString( sInput, sSymbols )

dim sClean
sClean = ""

dim l, i, c
l = len( sInput )
i = 1

'- while more input string
do while i <= l

'- snip out a character
c = mid( sInput, i, 1 )

'- if not a symbol, add to cleaned string
if InStr( sSymbols, c ) = 0 then
sClean = sClean & c
end if

i = i + 1
loop

CleanString = sClean
end function





Top

Posted by Nick Gammon   Australia  (23,158 posts)  Bio   Forum Administrator
Date Reply #1 on Mon 14 Jul 2003 09:09 PM (UTC)

Amended on Tue 15 Jul 2003 05:52 AM (UTC) by Nick Gammon

Message
You are doing pretty well even reading 4 Mb in a file system object. I would break it down (eg. 4 Mb at a time) - the efficiency will not be much less, however you might have a problem at the boundary, as it may read in the middle of a line.

Can't you just do it in smaller chunks - eg a line at a time? If time is really critical I wouldn't be using a script engine anyway, they are not really intended for global find-and-replace. For instance, Perl might be better and faster.

Your efficiency gains are minimal after the natural cluster size on the disk (say, 4 Kb).

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Meerclar   USA  (733 posts)  Bio
Date Reply #2 on Tue 15 Jul 2003 03:09 AM (UTC)
Message
If you break the files up, it *should* be possible to do so by line count rather than total size to avoid breaking midline. Nick does have a good point about perl being more suited to something like this most likely though, esp if time is a factor.

Meerclar - Lord of Cats
Coder, Builder, and Tormenter of Mortals
Stormbringer: Rebirth
storm-bringer.org:4500
www.storm-bringer.org
Top

Posted by Nnewbie   (5 posts)  Bio
Date Reply #3 on Tue 15 Jul 2003 01:39 PM (UTC)
Message
time is not a factor. however, the program simply freezes when it tries to process larger files. what is the limit on the number of characters an array can hold?
Top

Posted by Nick Gammon   Australia  (23,158 posts)  Bio   Forum Administrator
Date Reply #4 on Tue 15 Jul 2003 09:20 PM (UTC)
Message
I've already said "Can't you just do it in smaller chunks - eg a line at a time?" - I'll put it a bit more strongly:

Don't do it that way. Read a line at a time.

You are reading in a text file of over 4 Mb into a single variable, which VBscript is probably not designed to do, and then splitting that file up into lines anyway using Split, so now you have two copies of the 4 Mb (one in the variable, one in the array). The array may well have over 200,000 entries in it if each line is 20 bytes long.

You are then attempting to join that huge array back into a single variable and write it back. It only takes one bit of VB that doesn't cope with huge amounts of data and it won't work.

You only need a minor change - change the Read to ReadLine and then process a line at a time (writing each line as you go as well).

If time isn't an issue this should work fine, and you will have it working for any size file.


- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


13,501 views.

It is now over 60 days since the last post. This thread is closed.     Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.