A surjection theorem for maps with singular perturbation and loss of derivatives

In this paper we introduce a new algorithm for solving perturbed nonlinear functional equations which admit a right-invertible linearization, but with an inverse that loses derivatives and may blow up when the perturbation parameter $\epsilon$ goes to zero. These equations are of the form $F_\epsilon(u)=v$ with $F_\epsilon(0)=0$, $v$ small and given, $u$ small and unknown. The main difference with the by now classical Nash-Moser algorithm is that, instead of using a regularized Newton scheme, we solve a sequence of Galerkin problems thanks to a topological argument. As a consequence, in our estimates there are no quadratic terms. For problems without perturbation parameter, our results require weaker regularity assumptions on $F$ and $v$ than earlier ones, such as those of Hormander. For singularly perturbed functionals, we allow $v$ to be larger than in previous works. To illustrate this, we apply our method to a nonlinear Schrodinger Cauchy problem with concentrated initial data studied by Texier-Zumbrun, and we show that our result improves significantly on theirs.


Introduction
The basic idea of the inverse function theorem (henceforth IFT) is that, if a map F is differentiable at a point u 0 and the derivative DF (u 0 ) is invertible, then the map itself is invertible in some neighbourhood of u 0 . It has a long and distinguished history (see [20] for instance), going back to the inversion of power series in the seventeenth century, and has been extended since to maps between infinite-dimensional spaces. If the underlying space is Banach, and if one is only interested in the local surjectivity of F , that is, the existence, near u 0 , of a solution u to the equation F (u) = v for v close to F (u 0 ), one just needs to assume that F is of class C 1 and that DF (u 0 ) has a right-inverse L(u 0 ). The standard proof is based on the Picard scheme: which converges geometrically to a solution of F (u) = v provided F (u 0 ) − v is small enough. In the C 2 case, the Newton algorithm: u n = u n−1 − L(u n−1 )(F (u n−1 ) − v) uses the right-invertibility of DF (u) for u close to u 0 , and provides local quadratic convergence.
In functional analysis, u will typically be a function. In many situations the IFT on Banach spaces will be enough, but in the study of Hamiltonian systems and Date: May 27, 2021. 1 PDEs, one encounters cases when the right-inverse L(u) of DF (u) loses derivatives, i.e. when L(u)F (w) has less derivatives than u and w. In such a case, the Picard and Newton schemes lose derivatives at each step. The first solutions to this problem are due, on the one hand, to Kolmogorov [19] and Arnol'd [2], [3], [4] who investigated perturbations of completely integrable Hamiltonian systems in the analytic class, and showed that invariant tori persist under small perturbations, and, on the other hand, to Nash [23], who showed that any smooth compact Riemannian manifold can be imbedded isometrically into an Euclidian space of sufficiently high dimension 1 .
In both cases, the fast convergence of Newton's scheme was used to overcome the loss of regularity. Since Nash was considering functions with finitely many derivatives, he had to introduce a sequence of smoothing operators S n , in order to regularize L(u n−1 )(F (u n−1 ) − v), and the new scheme was u n = u n−1 − S n L(u n−1 )(F (u n−1 ) − v) .
An early presentation of Nash's method can be found in Schwartz' notes [24]. It was further improved by Moser [22], who used it to extend the Kolmogorov-Arnol'd results to C k Hamiltonians. The Nash-Moser method has been the source of a considerable amount of work in many different situations, giving rise in each case to a so-called "hard" IFT. We will not attempt to review this line of work in the present paper. A survey up to 1982 will be found in [15]. In [17], Hörmander introduced a refined version of the Nash-Moser scheme which provides the best estimates to date on the regularity loss. We refer to [1] for a pedagogical account of this work, and to [5] for recent improvements. We also gained much insight into the Nash-Moser scheme from the papers [7], [8], [9], [10], [26].
The question we want to address here is the following. The IFT implies that the range of F contains a neighborhood V of v 0 = F (u 0 ). What is the size of V?
In general, when one tries to apply directly the abstract Nash-Moser theorem, the estimates which can be derived from its proof are unreasonably small, many orders of magnitude away from what can be observed in numerical simulations or physical experiments. Moreover, precise estimates for the Nash-Moser method are difficult to compute, and most theoretical papers simply do not address the question.
So we shall address instead a "hard" singular perturbation problem with loss of derivatives. The same issue appears in such problems, as we shall explain in a moment, but it takes a simpler form: one tries to find a good estimate on the size of V as a power of the perturbation parameter ε. Such an asymptotic analysis has been carefully done in the paper of Texier and Zumbrun [26] which has been an important source of inspiration to us, and we will be able to compare our results with theirs. As noted by these authors, the use of Newton's scheme implies an intrinsic limit to the size of V.
Let us explain this in the "soft" case, without loss of derivatives. Suppose that for every 0 < ε ≤ 1 we have a C 2 map F ε between two Banach spaces X and Y , such that F ε (0) = 0, and, for all u ≤ R, uu F ε (u) ||| ≤ K 1 Nash's theorem on isometric embeddings was later re-proved by Gunther [14], who found a different formulation of the problem and was able to use the classical IFT in Banach spaces.
Then the Newton-Kantorovich Theorem (see [11], section 7.7 for a comprehensive discussion) tells us that the solution u ε of F ε (u) = v exists for v < ε 2 2KM 2 , and this is essentially the best result one can hope for using Newton's algorithm, as mentioned by Texier and Zumbrun in [26], Remark 2.22. Note that the use of a Picard iteration would give a similar condition.
However, in this simple situation where no derivatives are lost, it is possible, using topological arguments instead of Newton's method, to find a solution u provided v ≤ εR/M : one order of magnitude in ε has been gained. The first result of this kind, when F is C 1 and dim X = dim Y < ∞ , is due to Wazewski [27] who used a continuation method. See also [18] and [25] and the references in these papers, for more general results in this direction. In [12] (Theorem 2), using Ekeland's variational principle, Wazewski's result is proved in Banach spaces, assuming only that F is continuous and Gâteaux differentiable, the differential having a uniformly bounded right-inverse (in §2 below, we recall this result, as Theorem 5).
Our goal is to extend such a topological approach to "hard" problems with loss of derivatives, which up to now have been tackled by the Nash-Moser algorithm. A first attempt in this direction was made in [12] (Theorem 1), in the case when the estimates on the right-inverse do not depend on the base point, but it is very hard to find examples of such situations. The present paper fulfills the program in the general case, where estimates on the inverse depend on the base point.
In [10], Berti, Bolle and Procesi prove a new version of the Nash-Moser theorem by solving a sequence of Galerkin problems Π ′ n F (u n ) = Π ′ n v, u n ∈ E n , where Π n and Π ′ n are projectors and E n is the range of Π n . They find the solution of each projected equation thanks to a Picard iteration: where L n (u n−1 ) is a right inverse of D(Π ′ n F |E n )(u n−1 ). So, in [10] the regularized Newton step is not really absent: it is essentially the first step in each Picard iteration. As a consequence, the proof in [10] involves quadratic estimates similar to the ones of more standard Nash-Moser schemes. Moreover, Berti, Bolle and Procesi assume the right-invertibility of D(Π ′ n F |E n )(u n−1 ). This assumption is perfectly suitable for the applications they consider (periodic solutions of a nonlinear wave equation), but in general it is not a consequence of the right-invertibility of DF (u n−1 ), and this restricts the generality of their method as compared with the standard Nash-Moser scheme.
As in [10], we work with projectors and solve a sequence of Galerkin problems. But in contrast with [10], the Newton steps are completely absent in our new algorithm, they are replaced by the topological argument from [12] (Theorem 2), ensuring the solvability of each projected equation. Incidentally, this allows us to work with functionals F that are only continuous and Gâteaux-differentiable, while the standard Nash-Moser scheme requires twice-differentiable functionals. Our regularity assumption on v also seems to be optimal, and even weaker than in [17]. Moreover, our method works assuming either the right-invertibility of D(Π ′ n F |E n )(u) as in [10], or the right-invertibility of DF (u) (in the second case, our proof is more complicated). But in our opinion, the main advantage of our approach is the following: there are no more quadratic terms in our estimates, as a consequence we can deal with larger v's, and this advantage is particularly obvious in the case of singular perturbations.
To illustrate this, we will give an abstract existence theorem with a precise estimate of the range of F for a singular perturbation problem: this is Theorem 3 below. Comparing our result with the abstract theorem of [26], one can see that we have weaker assumptions and a stronger conclusion. Then we will apply Theorem 3 to an example given in [26], namely a Cauchy problem for a quasilinear Schrödinger system first studied by Métivier and Rauch [21]. Texier and Zumbrun use their abstract Nash-Moser theorem to prove the existence of solutions of this system on a fixed time interval, for concentrated initial data. Our abstract theorem allows us to increase the order of magnitude of the oscillation in the initial data. After reading our paper, Baldi and Haus [6] have been able to increase even more this order of magnitude, using their own version [5] of the Newton scheme for Nash-Moser, combined with a clever modification of the norms considered in [26] and an improved estimate on the second derivative of the functional. In contrast, our proof follows directly from our abstract theorem, taking exactly the same norms and estimates as in [26], and without even considering the second derivative of the functional.
The paper is constructed as follows. In Section 2, we present the general framework: we are trying to solve the equation F ε (u) = v near F ε (0) = 0, when F ε maps a scale of Banach spaces of functions into another and admits a right-invertible Gâteaux differential with "tame estimates" involving losses of derivatives and negative powers of ε. After giving our precise assumptions, we state our main theorem. Section 3 is devoted to its proof. In Section 4, we apply it to the example taken from Texier and Zumbrun [26], and we compare our results with theirs.
We shall assume that to each Λ ∈ [1, ∞) is associated a continuous linear projection Π(Λ) on V 0 , with range E(Λ) ⊂ V S . We shall also assume that the spaces E(Λ) form a nondecreasing family of sets indexed by [1, ∞), while the spaces Ker Π(Λ) form a nonincreasing family. In other words: Finally, we assume that the projections Π(Λ) are "smoothing operators" satisfying the following estimates: Polynomial growth and approximation: There are constants A 1 , A 2 ≥ 1 such that, for all numbers 0 ≤ s ≤ S, all Λ ∈ [1, ∞) and all u ∈ V s , we have: When the above properties are met, we shall say that (V s , · s ) 0≤s≤S endowed with the family of projectors It is well-known (see e.g. [10]) that (2.1,2.2) imply: .
2.2. Main theorem. We state our result in the framework of singular perturbations, in the spirit of Texier and Zumbrun [26]. The norms · s , · ′ s on the tame scales (V s ), (W s ) may depend on the perturbation parameter ε ∈ (0, 1], as well as the projectors Π(Λ) , Π ′ (Λ) and their ranges E(Λ), E ′ (Λ) . But we impose that S and the constants A i , A ′ i appearing in estimates (2.1, 2.2, 2.3) be independent of ε. In order to avoid burdensome notations, the dependence of the norms, projectors and subspaces on ε will not be explicit in the sequel.
Denote by B s the unit ball in V s : In the sequel we fix nonnegative constants s 0 , m, ℓ, ℓ ′ and g, independent of ε. We will assume that S is large enough.
We first recall the definition of Gâteaux-differentiability, in a form adapted to our framework: Definition 1. We shall say that a function F : Note that, even in finite dimension, a G-differentiable map need not be C 1 , or even continuous. However, if DF : V s+m → L(V s+m , W s ) is locally bounded, then F : V s+m → W s is locally Lipschitz, hence continuous. In the present paper, we will always be in such a situation.
We now consider a family of maps (F ε ) 0<ε≤1 with F ε : B s0+m → W s0 . We are ready to state our assumptions on this family: • We shall say that the maps F ε : B s0+m → W s0 (0 < ε ≤ 1) form an S-tame differentiable family if they are G-differentiable with respect to u, and, for some positive constant a , for all ε ∈ (0, 1] and all s ∈ and for all s 0 ≤ s ≤ S − max {ℓ, ℓ ′ }, we have the tame inverse estimate: In this definition, the integers m, ℓ, ℓ ′ denote the loss of derivatives for DF ε and its right-inverse, and g > 0 denotes the strength of the singularity at ε = 0. The unperturbed case of a fixed map can be recovered by setting ε = 1. We want to solve the equation F ε (u) = v. There are three things to look for. How regular is v ? How regular is u, or, equivalently, how small is the loss of derivatives between v and u ? How does the existence domain depend on ε ? The following result answers them in a near-optimal way. Theorem 3. Assume that the maps F ε (0 < ε ≤ 1) form an S-tame differentiable family between the tame scales (V s ) 0≤s≤S and (W s ) 0≤s≤S , with F ε (0) = 0 for all 0 < ε ≤ 1. Assume, in addition, that (DF ε ) 0<ε≤1 is either tame right-invertible or tame Galerkin right-invertible. Let s 0 , m, g, ℓ, ℓ ′ be the associated parameters.
Then, for S large enough, there is r > 0 such that, whenever 0 < ε ≤ 1 and v ′ δ ≤ rε g ′ , there exists some u ε ∈ B s1 satisfying: As we will see, the proof of Theorem 3 is much shorter under the assumptions that DF ε is Galerkin right-invertible. But in many applications, it is easier to check that DF ε is tame right-invertible than tame Galerkin right-invertible. See [10], however, where an assumption similar to (2.7, 2.8) is used.
All other "hard" surjection theorems that we know of require some additional conditions on the second derivative of F ε . Here we do not need such assumptions, in fact we only assume F ε to be G-differentiable, not C 2 .
As for the three questions we raised, let us explain in which sense the answers are almost optimal in Theorem 3. For the tame estimates (2.4),(2.6) to hold, one needs u ∈ B s1 with s 1 ≥ s 0 + max{m, ℓ}. When solving the linearized equation so it seems necessary to assume δ ≥ s 1 + ℓ ′ , and we find that the strict inequality is sufficient. Replacing s 1 with its minimal value, our condition on δ becomes We have not found this condition in the literature: in [17] for instance, a stronger assumption is made, namely δ > s 0 + max{2m + ℓ ′ , ℓ} + ℓ ′ .
For the dependence of v ′ δ on ε, the constraint g ′ > g also seems to be nearly optimal. Indeed, the solution u ε has to be in B s1 , but the right-inverse L ε of DF ε has a norm of order ε −g , so the condition v ′ δ ε g seems necessary. We find that the condition v ′ δ ε g ′ is sufficient. Our condition on S is of the form S ≥ S 0 where S 0 depends only on the parameters s 0 , m, g, ℓ, ℓ ′ and g ′ , s 1 , δ. Then r depends only on these parameters and the constants A i , A ′ i associated with the tame scales. In principle, all these constants could be made explicit, but we will not do it here. Let us just mention that one can take S 0 = O 1 g ′ −g as g ′ → g, all other parameters remaining fixed. This follows from the inequality σ < ζg/η in Lemma 1.
In the case of a tame right-invertible differential, we can restate our theorem in a form that allows direct comparison with [26]: Theorem 2.19 and Remarks 2.9, 2.14.

Proof of Theorem 3
The proof consists in constructing a sequence (u n ) n≥1 which converges to a solution u of F (u) = v. At each step, in order to find u n , we solve a nonlinear equation in a Banach space, using Theorem 2 in [12], which we restate below for the reader's convenience (the notation ||| L ||| stands for the operator norm of any linear continuous map L between two Banach spaces): Theorem 5. Let X and Y be Banach spaces. Let f : B X (0, R) → Y be continuous and Gâteaux-differentiable, with f (0) = 0. Assume that the derivative Df (u) has a right-inverse L (u), uniformly bounded on the ball B X (0, R): Note first that this is a local surjection theorem, not an inverse function theorem: with respect to the IFT, we lose uniqueness. On the other hand, the regularity requirement on f and the smallness condition on v are much weaker. As mentioned in the Introduction, for a C 1 functional in finite dimensions, this theorem has been proved a long time ago by Wazewski [27] by a continuation argument (we thank Sotomayor for drawing our attention to this result). For a comparison of the existence and uniqueness domains in the C 2 case with dim X = dim Y , see [16], chapter II, exercise 2.3.
It turns out that the proof of Theorem 3 is much easier if one assumes that the family (DF ε ) is tame Galerkin right-invertible. But most applications require that (DF ε ) be tame right-invertible. Let us explain why the proof is longer in this case. In our algorithm, we will use two sequences of projectors Π n := Π(Λ n ) and for some α > 1 close to 1, and M n = Λ ϑ n for some ϑ ≤ 1 such that ϑα > 1. The algorithm consists in finding, by induction on n and using Theorem 5 at each step, a solution u n ∈ E n of the problem Π ′ n F ε (u n ) = Π ′ n−1 v. For this, we need Π ′ n DF ε (u) |E n to be invertible for u in a certain ball B n , with estimates on the right inverse for a certain norm · Nn .
When the family (DF ε ) is tame Galerkin right-invertible, we can take ϑ = 1 so that M n = Λ n , instead of assuming ϑ < 1. Then the right-invertibility of Π ′ n DF ε (u) |E n follows immediately from the definition. But when (DF ε ) is only tame right-invertible, it is crucial to take ϑ < 1. The intuitive idea is the following. One can think of DF ε (u) as very large right-invertible matrix. The topological argument we use requires Π ′ n DF ε (u) |E n to have a rightinverse for u in a suitable ball. If we take M n = Λ n , this is like asking that a square submatrix of a right-invertible matrix be invertible. In general this is not true. But a rectangular submatrix, with more columns than lines, will be right-invertible if the full matrix is and if there are enough columns in the submatrix. This is why we impose M n < Λ n when we do not assume the tame Galerkin right-invertibility.
In the sequel, we assume that the family (DF ε ) is tame right-invertible, so we take ϑ < 1, and we point out the specific places where the arguments would be easier assuming, instead, that (DF ε ) is tame Galerkin right-invertible.
The sequence u n depends on a number of parameters η, α, β, ϑ and σ satisfying various conditions: in the first subsection we prove that these conditions are compatible. In the next one, we construct an initial point u 1 depending on η, α and ϑ. In the third one we construct, by induction, the remaining points u n which also depend on β and σ. Finally we prove that the sequence (u n ) converges to a solution u of the problem, satisfying the desired estimates.

3.1.
Choosing the values of the parameters. We are given s 1 ≥ s 0 +max {m, ℓ} , δ > s 1 + ℓ ′ and g ′ > g. These are fixed throughout the proof.
We introduce positive parameters η, α, β, ϑ and σ satisfying the following conditions: Note that condition (3.3) implies that δ < σ . Note also that condition (3.7) may be rewritten as which implies the simpler inequality Inequality (3.10) will also be used in the proof.
Remark. As already mentioned, if we assume that (DF ε ) is tame Galerkin right-invertible, (3.3) can be replaced by the condition δ < σ, and (3.5) and (3.6) are not needed. The remaining conditions can be satisfied by taking ϑ = 1 and for a larger set of the other parameters. The corresponding variant of Lemma 1 has a simpler proof. We can choose α > 1 such that δ > s 0 + α (s 1 − s 0 + ℓ") and τ such that 0 < τ < 1 α (δ − s 0 ) − s 1 + s 0 − ℓ", and we may impose condition (3.11). Then conditions (3.12), (3.13) and (3.14) are no longer required, and the last conditions δ < σ and (3.15) are easily satisfied by taking σ large enough.
The values (η, α, β, ϑ, σ) are now fixed. For the remainder of the proof we introduce an important notation. By x y we mean that there is some constant C such that x ≤ Cy. This constant depends on A i , A ′ i , a, b, s 0 , m, ℓ, ℓ ′ , g, g ′ , s 1 , δ and our additional parameters (η, α, β, ϑ, σ), but NOT on ε, nor on the regularity index s ∈ [0, S] or the rank n in any of the sequences which will be introduced in the sequel. For instance, the tame inequalities become: In the iteration process, we will need the following result: Lemma 2. If the maps F ε form an S-tame differentiable family and F ε (0) = 0, then, for u ∈ B s0+m ∩ V s+m and s 0 ≤ s ≤ S − m, we have: Proof. Consider the function ϕ (t) = F ε (tu) s . Since F ε is G-differentiable, we have: and since ϕ (0) = 0, we get the result.

We set
We choose the following norms on E 1 , E ′ 1 : Endowed with these norms, E 1 and E ′ 1 are Banach spaces. We shall use the notation ||| L ||| N 1 for the operator norm of any linear continuous map L from the Banach space E ′ 1 to a Banach space that can be either E 1 or E ′ 1 . The map F ε induces a map f 1 : B s0+m ∩ E 1 → E ′ 1 defined by f 1 (u) := Π ′ 1 F ε (u) for u ∈ B s0+m ∩ E 1 . Note that f 1 (0) = 0. We will use the local surjection theorem to show that the range of f 1 covers a neighbourhood of 0 in E ′ 1 . We begin by showing that Df 1 has a right inverse.
Note that, if we assume that DF is tame Galerkin right-invertible, we can take M 1 = Λ 1 ≥ Λ, and Df 1 is automatically right-invertible, with the tame estimate (2.8). So the next subsection is only necessary if we assume that DF is tame right-invertible.
We will construct a sequence u n ∈ E n , n ≥ 1, starting from the initial point u 1 we found in the preceding section. For all n ≥ 2 the remaining points should satisfy the following conditions: We proceed by induction. Suppose we have found u 2 , ..., u n−1 satisfying these conditions. We want to construct u n . Lemma 4. Let us impose K ≥ 2. For all t with s 0 ≤ t < σ − αβ, and all i with 2 ≤ i ≤ n − 1, we have: where Σ (t) is finite and independent of n , ε.
Proof. By the interpolation formula, By (3.4) we can take t = s 1 , and we find a uniform bound for u n−1 in the s 1 -norm, namely: In particular, we will have , so the tame estimates hold at u n−1 .
we find a uniform bound in the σ-norm. We have: so, combining this with (3.21), we get:

3.3.2.
Setting up the induction step.
and that u 2 , ..., u n−1 have been found. We have seen that u n−1 s1 ≤ 1, so that the tame estimates hold at u n−1 , and we also have u n−1 σ Λ β n−1 . We want to find u n satisfying (3.22), (3.23) and (3.24). Since Π ′ n−1 F ε (u n ) = Π ′ n−2 v , we rewrite the latter equation as follows: .26) can be rewritten as follows: We choose the following norms on E n and E ′ n : Endowed with these norms, E n and E ′ n are Banach spaces. We shall use the notation ||| L ||| Nn for the operator norm of any linear continuous map L from the Banach space E ′ n to a Banach space that can be either E n or E ′ n .
We will solve the system (3.27), (3.28), (3.29) by applying the local surjection theorem to f n on the ball B Nn (0, r n ) ⊂ E n where: Note that if the solution z belongs to B Nn (0, r n ), then In other words, u n = u n−1 + z satisfies (3.23) and (3.24), so that the induction step is proved.
We begin by showing that Df n (z) has a right inverse.
Note that, if we assume that DF ε is tame Galerkin right-invertible, we can take M n = Λ n , and the result of the next subsection is obvious. This subsection is only useful if we assume that DF is tame right-invertible but not tame Galerkin right-invertible.

3.3.3.
Df n (z) has a right inverse for z Nn ≤ r n . In this subsection, we use conditions (3.5) and (3.6). We recall them for the reader's convenience: αη Take now any z ∈ B Nn (0, r n ). Arguing as above, we find that if , then: By (3.31) the tame estimates hold on z ∈ B Nn (0, r n ). Lemma 6. Take Λ 1 = Kε −η with K > 1 chosen large enough, independently of n and ε ∈ (0, 1]. Then, for all z ∈ B Nn (0, r n ): We have: h s0+m Λ −σ+s0+m n L ε (u n−1 + z) k σ By (3.32) and the tame estimates for L ε , we get: where we have used Lemma 5. Substituting in the preceding formula, we get: By the tame estimate (2.4), we have: From this it follows that: Hence: Since αϑ > 1, the dominant term in the parenthesis is the second one, and: From (3.5) and (3.6), it follows that the right-hand side is a decreasing function of n. To check that it is less than 1/2 for all n ≥ 2, it is enough to check it for n = 2. Since Λ 1 = Kε −η , substituting in the right-hand side, we get: By (3.5) and (3.6), both exponents C 1 and C 2 are larger than g/η. As a consequence, Π ′ n DF ε (u n−1 + z) h Nn ≤ 1 2 k ′ Nn for K chosen large enough, independently of n and 0 < ε ≤ 1.
Arguing as in subsection 3.2.2, we find that the Neumann series i≥0 I E ′ n − Df n (u) L n (u) i converges in operator norm.
Its sum is S n (u) = (Df n (u)L n (u)) −1 and it has operator norm at most 2. Then T n (u) := L n (u)S n (u) is a right inverse of Df n (u) , with the estimate ||| T n (u) ||| Nn ≤ 2 ||| L n (u) ||| Nn . We have already derived estimate (3.33) which immediately implies: From the tame estimates and Lemma 5, we also have: Since αϑ > 1, we have Λ ℓ ′ n−1 M ℓ ′ n . So the two preceding estimates can be combined, and we get the final estimate for the right inverse in operator norm: We recall them for the reader's convenience: Let us go back to (3.27). By Theorem 5 to solve Π ′ n f n (z) = ∆ n v + e n with z ∈ B Nn (0, r n ) it is enough that: Here r n is given by (3.30). We can estimate ||| T n (z) ||| Nn using (3.34). We need to estimate ∆ n v Nn and e n Nn .
We conclude that F ε (u ε ) = v, as desired, and this ends the proof of Theorem 3.
4. An application of the singular perturbation theorem 4.1. The result. In this section, we consider a Cauchy problem for nonlinear Schrödinger systems arising in nonlinear optics, a question recently studied by Métivier-Rauch [21] and Texier-Zumbrun [26]. Métivier-Rauch proved the existence of local in time solutions, with an existence time T converging to 0 when the H s norm of the initial datum goes to infinity. Texier-Zumbrun, thanks to their version of the Nash-Moser theorem adapted to singular perturbation problems, were able to find a uniform lower bound on T for certain highly concentrated initial data.
The H s norm of these initial data could go to infinity. By applying our "semiglobal" version of the Nash-Moser theorem, we are able to extend Texier-Zumbrun's result to even larger initial data. In the sequel we follow closely their exposition, but some parameters are named differently to avoid confusions with our other notations.
The problem takes the following form: A(∂ x ) = diag(λ 1 , · · · , λ n , −λ 1 , · · · , −λ n )∆ x and B = B C CB The coefficients b jj ′ , c jj ′ of the n × n matrices B, C are first-order operators with smooth coefficients: and c kjj ′ smooth complex-valued functions of u satisfying, for some integer p ≥ 2, some C > 0, all 0 ≤ |α| ≤ p and all u = (ψ,ψ) ∈ C 2n : Moreover, we assume that the following "transparency" conditions hold: the functions b kjj are real-valued, the coefficients λ j are real and pairwise distinct, and for any j, j ′ such that λ j + λ j ′ = 0, c jj ′ = c j ′ j .
We consider initial data of the form ε κ (a ε (x),ā ε (x)) with a ε (x) = a 1 (x/ε) where 0 < ε ≤ 1, a 1 ∈ H S (R d ) for some S large enough and a 1 H S small enough.
Our goal is to prove that the Cauchy problem has a solution on [0, T ] × R d for all 0 < ε ≤ 1 , with T > 0 independent of ε. Texier-Zumbrun obtain existence and uniqueness of the solution, under some conditions on κ, which should be large enough. This corresponds to a smallness condition on the initial datum when ε approaches zero. Our local surjection theorem only provides existence, but our condition on κ is less restrictive, so our initial datum is allowed to be larger. Note that, once existence is proved, uniqueness is easily obtained for this Cauchy problem, indeed local-in-time uniqueness implies global-in-time uniqueness. Our result is the following: Theorem 6. Under the above assumptions and notations, let us impose the additional condition .
for S large enough, and a 1 H S is small enough, then the Cauchy problem (4.1) has a unique solution in the functional Metivier-Rauch already provide existence for a fixed positive T when κ ≥ 1 . So we obtain something new in comparison with them when d 2 1 p−1 < 1 , that is, when Let us now compare our results with those of Texier-Zumbrun [26]. In order to do so, we consider the same particular values as in their Remark 4.7 and Examples 4.8, 4.9 pages 517-518. Let us illustrate this in 2 and 3 space dimensions.
Remark. After reading our paper, Baldi and Haus [6] have been able to relax even further the condition on κ, based on their version [5] of the classical Newton scheme in the spirit of Hörmander. A key point in their proof is a clever modification of the norms considered by Texier-Zumbrun, allowing better C 2 estimates on the functional. They also explain that their approach can be extended to other C 2 functionals consisting of a linear term perturbed by a nonlinear term of homogeneity at least p + 1. Our abstract theorem, however, seems more general since we do not need such a structure.

4.2.
Proof of Theorem 6. We have to show that our Corollary 4 applies. Our functional setting is the same as in [26], with slightly different notations.
So we see that the assumptions of Lemma 4.4 in [26] are satisfied by the parameters γ 0 = γ 1 = γ (note that our exponent p is denoted ℓ in [26]). The direct estimate (2.9) thus follows from Lemma 4.4 in [26]. Note that Lemma 4.4 of [26] also gives an estimate on the second derivative of Φ ε (·), but we do not need such an estimate.
Since d ≥ 2, this inequality is a consequence of our assumption κ ≥ d 2(p−1) . So our Corollary 4 implies the existence of a solution to the Cauchy problem (4.1). The uniqueness of this solution comes from the local-in-time uniqueness of solutions to the Cauchy problem. This proves Theorem 6 as a consequence of Corollary 4.

Remark.
In the examples 4.8 and 4.9 of [26], Texier and Zumbrun also study the case of oscillating initial data, i.e. a ε = a(x)e ix·ξ0/ε , and in the first submitted version of this paper we considered it as well. However, a referee pointed out to us that the corresponding statements were not fully justified in [26]. Indeed, in the proof of their Theorem 4.6, Texier and Zumbrun have to invert the linearized functional DΦ ε (u) for u in a neighborhood of the function a ε , denoted a f in their paper. For this purpose, it seems that they need the norm of their function a f to be controlled by ε γ . This condition appears in their Remark 2.14 and their Lemma 4.5, but not in the statement of their Theorem 4.6. This additional constraint does not affect their results for concentrating initial data in Examples 4.8, 4.9. But in the oscillating case, their statements seem overly optimistic. We did not want to investigate further that issue, this is why we only deal with the concentrating case. Note, however, that this difficulty with the oscillating case is overcome in the recent work [6], thanks to improved norms and estimates.

Conclusion
The purpose of this paper has been to introduce a new algorithm into the "hard" inverse function theorem, where both DF (u) and its right inverse L (u) lose derivatives, in order to improve its range of validity. To highlight this improvement, we have considered singular perturbation problems with loss of derivatives. We have shown that, on the specific example of a Schrödinger-type system of PDEs arising from nonlinear optics, our method leads to substantial improvements of known results. We believe that our approach has the potential of improving the known estimates in many other "hard" inversion problems.
In the statement and proof of our abstract theorem, our main focus has been the existence of u solving F (u) = v in the case when S is large and the regularity of v is as small as possible. We haven't tried to give an explicit bound on S, but with some additional work, it can be done. In an earlier version [13] of this paper, the reader will find a study of the intermediate case of a tame Galerkin right-invertible differential DF , with precise estimates on the parameter S depending on the loss of regularity of the right-inverse, in the special case s 0 = m = 0 and ℓ = ℓ ′ .