Jump to content

Wikipedia:Reference desk/Archives/Computing/2014 August 22

From Wikipedia, the free encyclopedia
Computing desk
< August 21 << Jul | August | Sep >> August 23 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


August 22

[edit]

Arrays With Definite, But Unknown, Size

[edit]

I'm working on an image enhancement program (my own tinkering) in Ruby and have a particularly involved algorithm - it takes a few hours on a non-HD image. Generally, I don't mind waiting, but, in this case, it is becoming a real pain. So, I am looking, essentially, to run this specific process in a more performance focused language (C, C++, C#, Java, etc.). My stumbling block is the following: I need to use several arrays whose size will depend on the resolution of the input image, but I do not want to use dynamic arrays - for any given input, the size of the array is fixed, so is there any language that supports static sized arrays that are declared using a variable instead of a literal number? (For example, something like "v = image_x_size; new array = stuff[v]"). Thank you for any help - sorry for any lack of clarity, I don't do a lot of actual programming so am not exactly sure how to phrase stuff - or, if there are any other things anyone might suggest; I have no problem learning language features, I'm just not sure which ones to learn.Phoenixia1177 (talk) 05:36, 22 August 2014 (UTC)[reply]

Let me add this to the question: if I declare a really large array, but only use a portion of it, does this have a major performance impact? For example, if I know every image is under 5000 x 5000, could I just use arrays of size 5000 without penalty?Phoenixia1177 (talk) 05:38, 22 August 2014 (UTC)[reply]
I don't see why you wouldn't want to use dynamic arrays. Just set the initial size big enough for your input. Bubba73 You talkin' to me? 05:52, 22 August 2014 (UTC)[reply]
Honestly, I don't really know - I've read a few places that they are slower and one of the reason large arrays are slow in languages like Ruby, Python, etc. I don't know much about various languages outside of the few I'm comfortable with; I just don't want to spend time on something that isn't going to end up doing what I want anyways.Phoenixia1177 (talk) 06:09, 22 August 2014 (UTC)[reply]
Well, what do you mean exactly by "dynamic arrays"? If you're talking about something like vector class of the Standard Template Library, sure, there's a little performance hit every time they reallocate — not a huge hit in most cases, but some. But in C or C++, once you figure out the size of your array, you can just allocate an array of that size on the fly using malloc or new, and then there's no more reallocation. That's "dynamic" in the sense that it doesn't get allocated until the program reaches that line, but I don't see why it would be slower than anything else.
It's true that that memory is on the "heap" rather than the "stack". But for really large arrays, you really don't want them on the stack anyway. Stack-corruption errors are really annoying and hard to find. --Trovatore (talk) 08:42, 22 August 2014 (UTC)[reply]
C# Lists or Arraylists would probably meet the need. More information is here. That page also describes how you can create an array of dynamically allocated size by first creating a list. A rather clunky possible alternative would be to use something like String.Split to create the array. --Phil Holmes (talk) 08:30, 22 August 2014 (UTC)[reply]
Matlab and the free clone Octave_(programming language) let you declare array size based in input, the way you want. While Matlab can be slower than C at many things, it is often faster at array/vector operations (very highly optimized code in C and Fortran underlies Matlab), and most non-experts (like me) find it much easier to code in. The package NumPy will also let you do what you want in Python. (as an aside, I've never heard anyone claim that porting code to Java was a good way to increase performance... but then again I have no experience with Ruby :) SemanticMantis (talk) 17:47, 22 August 2014 (UTC)[reply]
I usually just use a large static array for that type of thing. I agree that dynamic arrays are a pain. Large static arrays usually don't slow things down much, with one huge exception. If you exceed the RAM on your PC and have to go to paging space, things slow to a crawl immediately.
Here's another option: In Fortran, you can have parameters you define in the declarations, say MAX_X and MAX_Y, then use those to define your array sizes and everything else. So, for example, you could specify the highest pixel in both directions as (MAX_X, MAX_Y), somewhere inside the program. In this way, you only have to change two variables and recompile when switching to another image size.
A third option would be to basically do as above, but have a wraparound program to read the image size, set the parameters, and recompile for you. StuRat (talk) 21:10, 22 August 2014 (UTC)[reply]

I've measured the performance of dynamic arrays versus static arrays, and there is very little difference. However, what you don't want to do is to start with a small dynamic array and keep adding small amounts to it. If the program can tell from the input what size it needs, then set the dynamic array for that size. If it can't tell then start with a dynamic array that might be large enough, and if it isn't then increase the size by maybe 20%. Bubba73 You talkin' to me? 00:36, 23 August 2014 (UTC)[reply]

I'm not a Ruby programmer, but have you tried a different implementation of the language? Both Rubinius and JRuby claim to be faster than the official interpreter, at least for some programs.-gadfium 02:53, 23 August 2014 (UTC)[reply]

Thank you for all of the suggestions/help. I've since taken a look into Octave and Fortran - both seem reasonably easy to get to do what I want - C is another alternative I'm considering; I know enough of the basics, I'm pretty sure I could get that working fast as well. Currently, the problem part needs to loop through each pixel at each pixel and do more than a few calculations, so the loop ends up being around 4 trillion times. Ruby is nice, but it is not meant to handle that in a reasonable interval - I've looked into variants of Ruby in the past, my understanding is that since YARV was implemented as the main Ruby interpreter that, best case, the gains are marginal (if at all) - but even speeding things up by a factor of 3, or 4, would still be way too long a wait time. --As for Java, I've been told that for good code, once everything is loaded up, it can outperform compiled languages, I have no idea if that is true; plus, I've been meaning to learn it for a while. Thank you all again:-)Phoenixia1177 (talk) 08:40, 23 August 2014 (UTC)[reply]

What makes arrays slow in most languages is bounds checking. C++'s std::vector is (usually) not bounds checked in release builds, which makes it just as fast as anything you could do in C, and still safer (since you don't have to worry about memory leaks). There is a cost to dynamically resizing an array, but it's not that large, and more importantly you only pay that cost if you actually resize it. If you could implement your algorithm with a static array, that means you never need to resize it, so you might as well use a dynamic array. -- BenRG (talk) 23:29, 23 August 2014 (UTC)[reply]
Fortran seems to just skip the array bounds checking to speed things up. They must figure that the programmer can add in bounds checking, if they think it's worth the cost. StuRat (talk) 00:24, 24 August 2014 (UTC)[reply]
FORTRAN makes pointer aliasing impossible, so an entire category of difficult array bounds violations are eliminated at run-time. This means that FORTRAN can compile code that is no faster than equivalently-written C code, but is strictly safer from illegal memory access.
But all this analysis assumes our OP's performance is due to array element access times, which is altogeher an unlikely explanation for a five hour runtime. The OP probably has a complicated algorithm with many nested iterations and similarly complex execution patterns. Switching languages, or improving memory access times, will not address that issue. The OP needs to profile the execution time and start to understand where the execution time is being spent.
Nimur (talk) 17:37, 24 August 2014 (UTC)[reply]
I appreciate that that would be the normal assumption to make,, but in this case it is the arrays that are the issue. I've been using Ruby for 15+ hours a week for over a decade and am very familiar with all the little ins and outs of it (and various performance hacks - which I've tested with benchmarks time and again). Moreover, the actual program I'm working on, I've been messing with for 2 years and have hundreds of other subroutines all of which take a major speed hit proportionate to how many pixels are considered per pixel - regardless of computation details, all of them take a major hit as the radius increases; and there is nothing left to take advantage of to mitigate this. In this case, yes, the computation is complex, but not absurdly so, and I have benchmarked and tried variants, it doesn't make much of a difference. I'm not expecting to rewrite it in another language and it take half a second, but I do anticipate not having to wait all day to test it on a few images; and I do know that compiled C is going to be, at least, 6-7 times faster, which should be good enough for my ends.Phoenixia1177 (talk) 21:27, 24 August 2014 (UTC)[reply]
Guaranteed non-aliasing of arrays allows for automatic vectorization where it would otherwise be impossible, leading to much faster compiled code. That's why restrict was added to C. I don't see how it prevents out-of-bounds accesses.
I agree that bounds checking is unlikely to be the problem here, if the code is written in Ruby. Ruby is slow mainly because it's (typically) interpreted, and array bounds checks are probably a negligible part of the execution time. Bounds checks might be a significant part of the run time in (typically) compiled safe languages like Java. -- BenRG (talk) 18:22, 25 August 2014 (UTC)[reply]
It's not bounds checking in Ruby that is the problem, it is that I'm dealing with large arrays/arrays with lots of memory, which has been a bottleneck for the language. I wanted to know about arrays because I was switching to a language where that didn't necessarily have it, not so much because it was the problem with arrays - though, I'm not very clear on that. I imagine that there is a very horrible way of logically breaking up the data across smaller arrays (more in number), etc. in Ruby, and you can get more performance that way in that language (did it on a smaller scale issue for pathfinding in a small game), but that also makes it more of a pain to work with - and I'd like to learn a new language anyways, Ruby is great, but it is not fast anyways.Phoenixia1177 (talk) 22:38, 25 August 2014 (UTC)[reply]
Some other thoughts:
1) What's the size of each array element ? If dealing with R, G, B values of 0-255, for example, a 1-byte integer is enough for each, although you might need to shift the values using an offset, if the range is -127 to +128, for example.
2) What's the available RAM ? I'm still worried that you are exceeding it and going to (extremely slow) paging space.
3) Do you free up memory after you are done with it ?
4) Is anything else running when your program is ? You might even consider running it outside of Windows. StuRat (talk) 15:45, 26 August 2014 (UTC)[reply]
Thank you all for all of your help:-) I've rewritten the program in C++, it executes in a much more reasonable time frame; smaller test images I had taking a little over an 2 hours now finish in around fifteen minutes, so that's reasonable enough for me (it is an intensive computation, so I expect that it will take a while). I'm not sure what Ruby was doing memory-wise, which is one of the main things that I don't like about it; though, Ruby and large arrays don't go well together, it seems to use way more memory for them than it needs. At least on my computer, using 1.93, I can get an "out of memory" error creating a 12000 x 12000 of arrays with 3 integers in them; I have 12gb ram, so that really shouldn't be an issue.Phoenixia1177 (talk) 01:42, 28 August 2014 (UTC)[reply]

Image extraction

[edit]

How to extract this particular image in its original high-res? It looks like over 2000 px at full zoom, but every Firefox extension I tried saves it in about 700x800 px. Brandmeistertalk 13:16, 22 August 2014 (UTC)[reply]

The image is built from 100 or so smaller images at various scales (much like the way services like Google Maps works). Some websites use Zoomify to implement this; others use similar services - I don't know which system that particular site uses. If it was Zoomify, Commons hosts a script which extracts the elements and stitches them together - commons:Help:Zoomable images/dezoomify.py. Other systems will use a different scheme, so dezoomify would need to be adapted (for all I know it already has been). You might like to talk to the people who maintain that (at Commons and SourceForge) as they're surely aware of the general field of Zoomifyalikes. And note National Portrait Gallery and Wikimedia Foundation copyright dispute in passing. -- Finlay McWalterTalk 13:56, 22 August 2014 (UTC)[reply]
My Windows command line for Dezoomify doesn't recognize "coll" in the image's URL when I try to run Dezoomify. I initially thought the native hi-res may be embedded in the webpage's source code. Maybe someone can extract the image anyway, I'd thank in advance. Brandmeistertalk 15:00, 22 August 2014 (UTC)[reply]
You can always do a screen grab. The advantage is that it doesn't care how the image was built. The disadvantage is that it's limited to your screen resolution (although you might be able to stitch multiple screen grabs together), and may have some junk to clean up around the edges (the window frame, etc.). On my screen, it looks like 2×3 screen grabs should do it, although only 2×2 are needed if I trim off some of the empty top and bottom. If you don't know how to do a screen grab, tell us your Operating System and we will provide details. StuRat (talk) 20:42, 22 August 2014 (UTC)[reply]

Why is Java 7 prevalent over Java 8?

[edit]

Hello. I realize the above question is vague in some ways, but i see no better way to classify my question in a short amount of text.

I want to know why most people are using Java 7, and not Java 8.... and also why when looking to download java, a version of 7 is offered instead of Java 8. This is puzzling to me. Isn't Java 8 supposed to be an improved product by definition? Here is some info i have regarding this:

  • This webpage shows that Java 7 is used by more people than Java 8.
  • The official java download page offers a download of Java 7, "Version 7 Update 67" at the time of writing this.
  • While looking for answers to this question, i found that oracle has already began development of Java 9, to be released in 2016.
  • According to Java (programming language), Java 8 was released as recently as march of this year.
  • I have run into a site or two which say that Java 8 is horrible and reasons why.

I can understand that for certain products it may take a while for users to switch, if they do at all; versions of Windows Operating System come to mind. However, java isn't in this same boat in my opinion. First, Microsoft Windows costs money and an upgrade means vast and obvious changes to a person's experience; Java is free and users (non-programmers) notice hardly anything different except hopefully newer versions run faster. Second, Java runtime can (when feature enabled) show you when your version is out of date. If most users had this active, i would expect at least to see a numerical majority (50% + 1) to have Java 8, because it takes minutes to upgrade, not months! Yet oracle itself says the "latest version of java" is Java 7, update 67.

If a person were not tech savvy or not looking specifically for Java 8, it would be easy to not even think it exists! WHY??!? On top of this madness, there is already a version 9 coming when hardly a soul knows of 8. What is going on?!?!

216.173.144.188 (talk) 17:02, 22 August 2014 (UTC)[reply]

Java.com is for ordinary people looking to update the Java runtime to run the software they already have - the "not tech savvy or not looking specifically for Java 8" people you describe. Oracle wants these people to stick with Java 7 - they explain at "Why is Java 8 not available on java.com?" That why it doesn't say it's the "latest", it says its the "recommended" version. Java 8 is for developers, and for people who know (because their developers told them) that they should switch. Java is enterprise software, it's not like buying the latest iPhone - you want something that will definitely work with your existing software. Existing software, running under the new runtime, won't use most of its new features, and probably won't see much substantive improvement in performance. When developers change their software to make use of the new capabilities of Java 8, the software will require Java 8. Enterprise software has long support windows and enterprises don't want to change things every time a technology supplier iterates their platform. -- Finlay McWalterTalk 17:35, 22 August 2014 (UTC)[reply]

"you want something that will definitely work with your existing software"

... so this plus reading the link you provided..... suggests that Java 8 is still not fully stable, even though it has been in existence for years now! I must be honest in saying that this confuses me as much as the original question. However, your answer did help clear up things too! Thanks!

216.173.144.188 (talk) 17:43, 22 August 2014 (UTC)[reply]

(Multiple EC) AFAIK it's generally quite common that the Java auto update will only update within the version you are using. (For that matter so to with a fair amount of software.) See e.g. [1] which suggests on Windows x32, the Java autoupdater only moved users of 6 to 7 in December 2012. As per our article, this was over a year after release Java version history. There are obvious reasons for this including that you may break software which doesn't work with the newer version (which could happen with a minor version change but is far less likely). For similar reasons, the Java site normally doesn't give the latest version straight away.
Also while cost may be a factor, there are clearly many other factors why people don't update. E.g. many people are still using Windows 8 including me rather than updating to 8.1. Many people are using a version of Windows without the latest service pack. Heck for a long time many people were still using IE6 on Windows XP even though Internet Explorer 7 has existed since October 2006.
Nil Einne (talk) 17:48, 22 August 2014 (UTC)[reply]
No, it doesn't suggest that at all. What it means is that existing software needs to be retested when a major platform change is made, and until then the conservative thing to do is to run it on the existing platform (with small fixes and security patches designed to have no impact on the platform's public API). Java 8 is designed as much as is practical to be entirely compatible with Java 7 - but the point of a major number increment is to indicate that you don't guarantee perfect compatibility, and a few programs, which use outdated functionality and do odd things, may break and have to be fixed. Oracle's guide to where they've deliberately broken compatibility is here. Given the gigantic size of the Java platform, that's a pretty small list of pretty obscure things - and most of them are the final retiral of things that were already marked deprecated in Java 7 (things conscientious developers already should have eschewed years ago). Nothing I've said, or linked to, should remotely give you the idea that Java 8 is "not fully stable". -- Finlay McWalterTalk 17:56, 22 August 2014 (UTC)[reply]


Finlay: my mistake. I wasn't even thinking about deprecation, and how that might have an effect on existing programs being run on newer Java!

216.173.144.188 (talk) 18:05, 22 August 2014 (UTC)[reply]

It's worth considering what a "version number" means. To the marketing department, and to a non-technical public, a bump of the headline number usually means "it's better now". An iPhone 4 is better (presumably in every conceivable way) than an iPhone 3. Very often that means the company would like you to buy a new one, because they one you have is now "old". For the engineering department, a major version number often means something different - it signifies a compatibility horizon: that everything that worked on any older Java 7 should work on any newer Java 7, but if they want to make changes that might break something, they have to save that up for Java 8. There's often conflict inside large organisations about the competing meaning of these numbers: Sun (Java's originators) went through much internal gnashing of teeth with the version numbering for their Solaris operating system (see Solaris_(operating_system)#Version_history) and the JDK (see Java version history) with technically inexplicable jumps in version numering. -- Finlay McWalterTalk 18:18, 22 August 2014 (UTC)[reply]

my external backup hard drive has started to pop up as "AutoPlay" and Windows 7 says "Format this disk"

[edit]

I have been using the same computer and same external back up drive for years. Now suddenly (after the last Window 7 update?), it repeatedly pops up when I'm on the web, with AutoPlay flashing, then a Windows message "Format this disk". I solved it to some extent by pinning the AutoPlay to my taskbar. But it still occurs occasionally and I've had to turn off sound so I don't hear that irritating Windows "bing".

I've tried setting all my AutoPlay options to other programs, and I'm not going to format the disk - it has my backup on it. Any ideas? Thanks, Parabolooidal (talk) 19:11, 22 August 2014 (UTC)[reply]

First, are you sure the drive is supposed to show up as unformatted? With some backup software, this could happen, but if I were you I would confirm that everything in your backup is fine.
If you've done that then I would remove any drive letter or mount location for the partition/s via Disk Manager. I believe Windows will not assign any a drive letter even if you detach and reattach the drive in the future but I can't recall exactly. (Depending on whether it will be a problem for other drives, you could always disable automounting new drives completely. I do know if you do this, external drives give a mount point should keep it even if detached and reattached.) Of course, this could be rather annoying if your backup software can't handle the drive lacking a mount location, personally I would consider better software in that case.
Nil Einne (talk) 08:31, 25 August 2014 (UTC)[reply]